facetwarp

facetwarp is an extension of the ggplot2 specifically aimed at arranging faceted plots.

The main function within the facetwarp package is facet_warp, which is a close sibling of ggplot2::facet_wrap, hence the similar name. If you’re not already familiar with how to use ggplot2::facet_wrap, please start there.

Before you go any further, you should already be familiar with This allows you to ‘speak’ a graph from composable elements, instead of being limited to a predefined set of charts.

Part 1: wrap vs warp 🪄

wrap

First let’s recall what facet_wrap gives us using the iris dataset. 👇

library(ggplot2)

ggplot(iris) +
  geom_point(aes(x=Petal.Width, y=Petal.Length))+
  facet_wrap(vars(Species),nrow = 2)

Note that there are 3 facets, one for each species, and they are arranged in alphabetical order. Because we’ve arranged them into 2 rows, and there are only 3 facets, the 4th panel (lower, right) is not occupied.

warp

Now, we know that we’ve got other columns in this dataset, specifically Sepal.Width and Sepal.Length. Let’s explore those axis quickly by summarizing their values by Species

library(dplyr, warn.conflicts = FALSE)

ggplot(iris %>% 
            group_by(Species) %>% 
            summarize(median_Sepal.Width  = median(Sepal.Width),
                      median_Sepal.Length = median(Sepal.Length)))+
    geom_text(aes(x=median_Sepal.Width, y=median_Sepal.Length, label=Species))

In our facetted scatter plot above, instead of arranging the facets alphabetically, maybe we want the layout to mimic this Sepal.Length and Sepal.Width arrangement we see above.

IT IS TIME TO ✨ WARP THE FACETS 🪄

library(facetwarp)
ggplot(iris)+
    geom_point(aes(x=Petal.Width, y=Petal.Length))+
    facet_warp(vars(Species), macro_x='Sepal.Width', macro_y='Sepal.Length', nrow = 3, ncol = 3)

👆 Notice the layout has changed. facet_warp has repositioned the facets! In fact, they are mimicing the arrangement we saw above: * virginica at the top due to its high median_Sepal.Length * versicolor at the left due to its low median_Sepal.Width * setosa at the lower-right due to its low median_Sepal.Length and high median_Sepal.Width

This was accomplished using our macro axes. When we say macro_x='Sepal.Width', we’re saying, make no change to the “x axis” of the individual facets, but in order to arrange the facets themselves, treat Sepal.Width as the x-dimension.

Since we only need 4 panels total, we can try dropping nrow and ncol to 2 to condense the arrangement:

ggplot(iris)+
    geom_point(aes(x=Petal.Width, y=Petal.Length))+
    facet_warp(vars(Species), macro_x='Sepal.Width', macro_y='Sepal.Length', nrow = 2, ncol = 2)

Part 2: Building on the Warp Idea with Election Data

Let’s get familiar with a bit of US Presidential Election Data.

elections <- read.csv(file='https://gist.githubusercontent.com/mattdzugan/bf5bc48fad1850af59ac83a411f8c0d6/raw/8da67b51df907508f7c859fe29fc4637397513d8/County_Election_Data.csv')
elections <- elections %>% mutate(log_pop_density = log10(pop_density))
head(elections)
#>   county_fips   state state_po year county_name      candidate      party
#> 1        1001 ALABAMA       AL 2000     AUTAUGA        AL GORE   DEMOCRAT
#> 2        1001 ALABAMA       AL 2000     AUTAUGA GEORGE W. BUSH REPUBLICAN
#> 3        1001 ALABAMA       AL 2004     AUTAUGA     JOHN KERRY   DEMOCRAT
#> 4        1001 ALABAMA       AL 2004     AUTAUGA GEORGE W. BUSH REPUBLICAN
#> 5        1001 ALABAMA       AL 2008     AUTAUGA   BARACK OBAMA   DEMOCRAT
#> 6        1001 ALABAMA       AL 2008     AUTAUGA    JOHN MCCAIN REPUBLICAN
#>   candidate_votes total_votes pop_density med_age      lon      lat
#> 1            4942       17208    35.85342    39.2 -86.6429 32.53514
#> 2           11993       17208    35.85342    39.2 -86.6429 32.53514
#> 3            4758       20081    35.85342    39.2 -86.6429 32.53514
#> 4           15196       20081    35.85342    39.2 -86.6429 32.53514
#> 5            6093       23641    35.85342    39.2 -86.6429 32.53514
#> 6           17403       23641    35.85342    39.2 -86.6429 32.53514
#>   unemployment_rate med_hh_income percent_bachelors log_pop_density
#> 1               2.9         66444          28.13147        1.554531
#> 2               2.9         66444          28.13147        1.554531
#> 3               2.9         66444          28.13147        1.554531
#> 4               2.9         66444          28.13147        1.554531
#> 5               2.9         66444          28.13147        1.554531
#> 6               2.9         66444          28.13147        1.554531

In the US, the two primary parties are the DEMOCRAT and REPUBLICAN parties, we can analyze the margin that these parties have over one another in each county. But rather than just viewing the counties alphabetically, let’s arrange the counties by variables that matter.

We can try to warp the facets by characteristics that may impact voter tendancies. Specifically this time:

facet_warp(vars(county_name),
             macro_x = 'log_pop_density',
             macro_y = 'med_age')

let’s see it in context

ggplot(elections %>% filter(state_po == 'CA'))+
  labs(title='California Election Results by County', 
       subtitle = 'Older Counties appear Higher, Denser Counties appear further Right',
       y='proportion of votes')+
  theme_minimal()+
  theme(legend.position = 'None',
        panel.spacing = unit(1.2, "lines"),
        axis.text = element_text(size = 6))+
  geom_rect(aes(xmin=year-4, xmax=year, ymin=(1-candidate_votes/total_votes), ymax=candidate_votes/total_votes, fill=party, alpha=candidate_votes/total_votes>.5))+
  geom_step(aes(x=year, y=candidate_votes/total_votes, color=party), direction='vh', linewidth=0.8)+
  scale_alpha_manual(values=c(0,0.3))+
  scale_color_manual(values=c('#5768ac','#e24a41'))+
  scale_fill_manual(values=c('#5768ac','#e24a41'))+
  scale_x_continuous(limits=c(2000,2020), breaks = seq(2000,2020,4))+
  facet_warp(vars(county_name),
             macro_x = 'log_pop_density',
             macro_y = 'med_age')
#> Warning: Removed 116 rows containing missing values (`geom_rect()`).

Leaving Empty Space to Reveal Geometries

We can also take advantage of this mechanism to sort the facets geographically. In fact, if we play with the nrow and ncol a bit, we can even get something that starts to resemble the State of California

ggplot(elections %>% filter(state_po == 'CA'))+
  labs(title='California Election Results by County', 
       y='proportion of votes')+
  theme_minimal()+
  theme(legend.position = 'None',
        panel.spacing = unit(1.2, "lines"),
        axis.text = element_text(size = 6))+
  geom_rect(aes(xmin=year-4, xmax=year, ymin=(1-candidate_votes/total_votes), ymax=candidate_votes/total_votes, fill=party, alpha=candidate_votes/total_votes>.5))+
  geom_step(aes(x=year, y=candidate_votes/total_votes, color=party), direction='vh', linewidth=0.8)+
  scale_alpha_manual(values=c(0,0.3))+
  scale_color_manual(values=c('#5768ac','#e24a41'))+
  scale_fill_manual(values=c('#5768ac','#e24a41'))+
  scale_x_continuous(limits=c(2000,2020), breaks = seq(2000,2020,4))+
  facet_warp(vars(county_name),
             macro_x = 'lon',
             macro_y = 'lat', 
             nrow = 12, ncol = 6)
#> Warning: Removed 116 rows containing missing values (`geom_rect()`).

Note the “blue” counties along that Western Pacific Coast of California!

This happens because our underlying algorithm is attempting to fit the 58 counties in 12*6=72 possible grid spaces.
This leaves 14 unused spaces which means we’ll begin to see the underlying shape of the macro_x and macro_y data.

Part 3: Chicago Transity Authority Data

Another fun example is sorting train station data geographically.

ridership <- read.csv('https://gist.githubusercontent.com/mattdzugan/603d4ba67f29457e2f5ddcad27178e8c/raw/efc26e77b39685eece11b051fbabbc74adad2ba0/CTA_Ridership.csv')

ggplot(ridership[complete.cases(ridership), ])+
  theme_minimal()+
  theme(legend.position = 'None',
        panel.spacing = unit(1.2, "lines"))+
  labs(title="Ridership compared to January 2000", ylab='% Change', xlab='')+
  geom_hline(aes(yintercept=0))+
  geom_area(aes(x=month, y=ifelse(avg_weekday_rides>avg_weekday_rides_initial, avg_weekday_rides/avg_weekday_rides_initial-1,0)), fill="#27b376")+
  geom_area(aes(x=month, y=ifelse(avg_weekday_rides<avg_weekday_rides_initial, avg_weekday_rides/avg_weekday_rides_initial-1,0)), fill="#f9a73e")+
  scale_y_continuous(limits=c(-1, 2))+
  facet_warp(vars(stationame), macro_x = 'lon', macro_y = 'lat', nrow=16, ncol=13)

drawing