In ‘Easy Geom Recipes’ the computation we did was within groups, specifying the compute_group parameter within ggproto.

Grouping variables in ggplot2 are character, factor or logical variables.

If we use the compute_group parameter in our ggproto function to define our stat, computation will happen in a group-wise basis if aesthetics-variable pairing happens with such variables.

In other words, our target layer geom_* will divide up our data set by any character, factor, or logical categories, and then do computation before returning a data frame for plotting.

This is not always the desired behavior. Sometimes computation has to happen across groups. The examples that follow are such cases.

First we’ll see a regression model that has an indicator variable as one of its inputs. We want the model to be computed holistically; geom_smooth(), which you may have used, computes groupwise; i.e. one model for each group. We’ll use panel-level computation to display across group model results.

The second example is circle packing. We can use the {circlepack} library to visualize quantities for entities. To wrap this work up into a geom function, we need the computation that positions the circles to happen at the panel level – it shouldn’t do the computation within groups.

Finally, we look at a join of sf geometries as our computation. Here, I think it may be possible to use set compute_group or compute_panel and get similar visual results. However, computing and plotting a panel (we’ll do this with an inner join of an input data set and an sf reference data frame) is much faster than computing group-wise. (just some guesses, need to try with sf w/ fewer geometries than fips.)

class: inverse, middle, center

Step 0: use base ggplot2 to get the job done

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.2.0
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(palmerpenguins)

penguins <- remove_missing(penguins)

## Warning: Removed 11 rows containing missing values.

model <- lm(body_mass_g ~ flipper_length_mm + species, data = penguins)

penguins %>% 
  mutate(fitted = model$fitted.values) ->
penguins_w_fitted

penguins_w_fitted %>% 
  ggplot() + 
  aes(x = flipper_length_mm) + 
  aes(y = body_mass_g) + 
  aes(color = species) +
  geom_point() + 
  geom_line(aes(y = fitted, group = species))

Step 1: computation

define computation that ggplot2 should do for you, before plotting
- here it’s computing a variable with labels for each observation
test that functionality!

compute_panel_ols_ind <- function(data, scales, formula = y ~ x + indicator) {

  model <- lm(formula = formula,
              data = data)

  data.frame(x = data$x,
             y = model$fitted.values,
             indicator = data$indicator,
             xend = data$x, # for residuals geom
             yend = data$y  # for residuals geom
             )

}

penguins %>% 
  rename(x = bill_length_mm, 
         y = bill_depth_mm,
         indicator = species) %>%
  select(x, y, indicator) %>% 
  remove_missing() %>% # ggplot2
  compute_panel_ols_ind() %>% 
  tibble() # for nicer display

## # A tibble: 333 × 5
##        x     y indicator  xend  yend
##    <dbl> <dbl> <fct>     <dbl> <dbl>
##  1  39.1  18.4 Adelie     39.1  18.7
##  2  39.5  18.5 Adelie     39.5  17.4
##  3  40.3  18.6 Adelie     40.3  18  
##  4  36.7  17.9 Adelie     36.7  19.3
##  5  39.3  18.4 Adelie     39.3  20.6
##  6  38.9  18.4 Adelie     38.9  17.8
##  7  39.2  18.4 Adelie     39.2  19.6
##  8  41.1  18.8 Adelie     41.1  17.6
##  9  38.6  18.3 Adelie     38.6  21.2
## 10  34.6  17.5 Adelie     34.6  21.1
## # … with 323 more rows

Step 2: define ggproto

–

what’s the naming convention for the proto object?
which aesthetics are required as inputs?
where does the function from above go?

StatLmindicator <- ggplot2::ggproto("StatLmindicator",
                                      ggplot2::Stat,
                                      compute_panel = compute_panel_ols_ind,
                                      required_aes = c("x", "y", "indicator"),
                                      default_aes = ggplot2::aes(group = ggplot2::after_stat(indicator)
                                    ))

Step 3: define geom_* function

–

define the stat and geom for your layer

geom_lm_indicator <- function(mapping = NULL, data = NULL,
                                position = "identity", na.rm = FALSE,
                                show.legend = NA,
                                inherit.aes = TRUE, ...) {
  ggplot2::layer(
    stat = StatLmindicator, # proto object from Step 2
    geom = ggplot2::GeomLine, # inherit other behavior
    data = data,
    mapping = mapping,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

Step 4: Enjoy! Use your function

ggplot(palmerpenguins::penguins) +
  aes(x = flipper_length_mm ) +
  aes(y = body_mass_g ) +
  geom_point() + 
  aes(color = species) +
  aes(indicator = species) +
  geom_lm_indicator()

## Warning: Removed 2 rows containing non-finite values (`stat_lmindicator()`).

## Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(palmerpenguins::penguins) +
  aes(x = flipper_length_mm ) +
  aes(y = body_mass_g ) +
  geom_point() + 
  aes(color = species) +
  aes(indicator = species) +
  geom_lm_indicator(formula = y ~ x * indicator)

## Warning: Removed 2 rows containing non-finite values (`stat_lmindicator()`).
## Removed 2 rows containing missing values (`geom_point()`).

ggplot(palmerpenguins::penguins) +
  aes(x = flipper_length_mm ) +
  aes(y = body_mass_g ) +
  geom_point() + 
  aes(color = species) +
  aes(indicator = species) +
  geom_lm_indicator(formula = y ~ I(x^3) + I(x^2) + x + indicator)

## Warning: Removed 2 rows containing non-finite values (`stat_lmindicator()`).
## Removed 2 rows containing missing values (`geom_point()`).

Explain: What will the in-facet behavior be?

ggplot(palmerpenguins::penguins) +
  aes(x = flipper_length_mm ) +
  aes(y = body_mass_g ) +
  geom_point() + 
  aes(color = species) +
  aes(indicator = species) +
  geom_lm_indicator(formula = y ~ x + indicator) + 
  facet_wrap(~sex)

## Warning: Removed 2 rows containing non-finite values (`stat_lmindicator()`).

## Warning: Removed 2 rows containing missing values (`geom_point()`).

Now you

Create a residuals layer analogue.

Recipe #2: circle pack

Step 0: Use base ggplot2 to get the job done

library(tidyverse)
library(gapminder)
gapminder %>%  
  filter(continent == "Americas") %>%  
  filter(year == 2002) %>%  
  select(country, pop) ->  
prep  

packcircles::circleProgressiveLayout(prep$pop,  
                                         sizetype = 'area') ->  
pack  

cbind(prep, pack) %>%
  mutate(id = row_number()) %>% 
  tibble() # for nicer display

## # A tibble: 25 × 6
##    country                  pop      x      y radius    id
##    <fct>                  <int>  <dbl>  <dbl>  <dbl> <int>
##  1 Argentina           38331121 -3493.     0   3493.     1
##  2 Bolivia              8445134  1640.     0   1640.     2
##  3 Brazil             179914212  2733. -9142.  7568.     3
##  4 Canada              31902268  1151.  4801.  3187.     4
##  5 Chile               15497046  5274.  1302.  2221.     5
##  6 Colombia            41008227 10562. -1161.  3613.     6
##  7 Costa Rica           3834934 -4573. -4469.  1105.     7
##  8 Cuba                11226999 -7453. -3647.  1890.     8
##  9 Dominican Republic   8650322 -8637.  -300.  1659.     9
## 10 Ecuador             12921234 -7908.  3315.  2028.    10
## # … with 15 more rows

pack %>%  
  packcircles::circleLayoutVertices(npoints = 50) ->  
circle_outlines  

circle_outlines %>%  
  ggplot() +  
  aes(x = x, y = y) +  
  geom_polygon(colour = "black", alpha = 0.6) +  
  aes(group = id) +  
  aes(fill = factor(id)) +  
  geom_text(data = cbind(prep, pack),  
            aes(x, y, size = pop, label = country,  
                group = NULL, fill = NULL)) +  
  theme(legend.position = "none") +  
  coord_equal()

Step 1: computation

define computation that ggplot2 should do for you, before plotting
- here it’s computing a variable with labels for each observation
test that functionality!

# you won't use the scales argument, but ggplot will later
compute_panel_circle_pack <- function(data, scales){
 
  data %>%
    mutate(id = row_number()) ->
  data1
 
  if(is.null(data$area)){
    
    data1 %>% 
      mutate(area = 1) ->
    data1
    
  }
  
  data1 %>%  
    pull(area) %>%
    packcircles::circleProgressiveLayout(
                                         sizetype = 'area') %>%
    packcircles::circleLayoutVertices(npoints = 300) %>%
    left_join(data1) #%>%
    # rename(group = id)
   
}

Step 1.b Test computation

# step 1b test the computation function
gapminder::gapminder %>%
  filter(continent == "Americas") %>%  
  filter(year == 2002) %>%  
  # input must have required aesthetic inputs as columns
  rename(area = pop) %>%
  compute_panel_circle_pack() %>%
  head()

## Joining with `by = join_by(id)`

##             x         y id   country continent year lifeExp     area gdpPercap
## 1   0.0000000   0.00000  1 Argentina  Americas 2002   74.34 38331121  8797.641
## 2  -0.7660766  73.15225  1 Argentina  Americas 2002   74.34 38331121  8797.641
## 3  -3.0639703 146.27241  1 Argentina  Americas 2002   74.34 38331121  8797.641
## 4  -6.8926731 219.32842  1 Argentina  Americas 2002   74.34 38331121  8797.641
## 5 -12.2505058 292.28821  1 Argentina  Americas 2002   74.34 38331121  8797.641
## 6 -19.1351181 365.11980  1 Argentina  Americas 2002   74.34 38331121  8797.641

gapminder::gapminder %>%
  filter(continent == "Americas") %>%  
  filter(year == 2002) %>%  
  # input must have required aesthetic inputs as columns
  rename(area = pop) %>%
  compute_panel_circle_pack() %>% 
  str()

## Joining with `by = join_by(id)`

## 'data.frame':    7525 obs. of  9 variables:
##  $ x        : num  0 -0.766 -3.064 -6.893 -12.251 ...
##  $ y        : num  0 73.2 146.3 219.3 292.3 ...
##  $ id       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ year     : int  2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 ...
##  $ lifeExp  : num  74.3 74.3 74.3 74.3 74.3 ...
##  $ area     : int  38331121 38331121 38331121 38331121 38331121 38331121 38331121 38331121 38331121 38331121 ...
##  $ gdpPercap: num  8798 8798 8798 8798 8798 ...

# step 1b test the computation function
gapminder::gapminder %>%
  filter(continent == "Americas") %>%  
  filter(year == 2002) %>%  
  # input must have required aesthetic inputs as columns
  rename(area = pop) %>%
  compute_panel_circle_pack() %>% 
  ggplot() + 
  aes(x = x, y = y, fill = country) + 
  geom_polygon()

## Joining with `by = join_by(id)`

# my_setup_data <- function(data, params){
#                                     if(data$group[1] == -1){
#                                       nrows <- nrow(data)
#                                       data$group <- seq_len(nrows)
#                                     }
#                                     data
#                                   }

Step 2: define ggproto

what’s the naming convention for the proto object?
which aesthetics are required as inputs?
where does the function from above go?

StatCirclepack <- ggplot2::ggproto(`_class` = "StatCirclepack",
                                  `_inherit` = ggplot2::Stat,
                                  required_aes = c("id"),
                                  compute_panel = compute_panel_circle_pack,
                                  # setup_data = my_setup_data,
                                  default_aes = aes(group = after_stat(id))
                                  )

Step 3: define geom_* function

geom_polygon_circlepack <- function(mapping = NULL, data = NULL,
                           position = "identity", na.rm = FALSE,
                           show.legend = NA,
                           inherit.aes = TRUE, ...) {
  ggplot2::layer(
    stat = StatCirclepack, # proto object from Step 2
    geom = ggplot2::GeomPolygon, # inherit other behavior
    data = data,
    mapping = mapping,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

Step 4: Enjoy! Use your function

gapminder::gapminder %>%
  filter(year == 2002) %>%
  ggplot() +
  aes(id = country) + 
  geom_polygon_circlepack(alpha = .5, size = .002)

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Joining with `by = join_by(id)`

last_plot() + 
  aes(fill = continent)

## Joining with `by = join_by(id)`

last_plot() + 
  aes(area = pop)

## Joining with `by = join_by(id)`

last_plot() +
  aes(color = continent) +
  facet_wrap(facets = vars(continent))

## Joining with `by = join_by(id)`
## Joining with `by = join_by(id)`
## Joining with `by = join_by(id)`
## Joining with `by = join_by(id)`
## Joining with `by = join_by(id)`

Now you… Create a layer that labels the packed circles…

Notes:

This might be a good example where compute group and compute panel could be applied? What if we just wanted to show the count of countries within continents as the size of the bubble for example, and the bubble i.d. is continent. We could prepare that ahead of time, but what would it look like if ggplot2 does the count and then the circle pack algorithm for us?

Recipe #3. sf geometries

Step 0: get it done with base ggplot2

Example simple feature to data frame…

library(tidyverse)
library(sf)

## Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE

fips_geometries <- readRDS(url("https://wilkelab.org/SDS375/datasets/US_counties.rds")) %>%
  rename(FIPS = GEOID)

US_census <- read_csv("https://wilkelab.org/SDS375/datasets/US_census.csv",
                      col_types = cols(FIPS = "c")
          )

# from Claus Wilke on ggplot2

fips_geometries %>%
  left_join(US_census, by = "FIPS") %>%
  ggplot() +
  geom_sf(aes(fill = mean_work_travel), linewidth = .1) + 
  scale_fill_viridis_c(option = "magma") ->
classic_plot_sf_layer

classic_plot_sf_layer

Step 1: computation

layer_data(classic_plot_sf_layer) %>% 
  select(geometry, xmin, xmax, ymin, ymax) %>% 
  bind_cols(tibble(FIPS = fips_geometries$FIPS)) %>% 
  rename(fips = FIPS) ->
reference

compute_panel_county <- function(data, scales){
 
  data %>% 
    # inner_join(fips_ggplot2_reference) 
    inner_join(reference, multiple = "all") %>% 
    mutate(group = -1)

}

Step 2: pass to ggproto

StatCounty <- ggplot2::ggproto(`_class` = "StatCounty",
                                  `_inherit` = ggplot2::Stat,
                                  # required_aes = c("fips"),
                                  # setup_data = my_setup_data,
                                  compute_panel = compute_panel_county,
                                  default_aes = aes(geometry = after_stat(geometry))
                                  )

Step 3: write geom_* function

geom_sf_county <- function(
  mapping = NULL,
  data = NULL,
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE, ...) {
  ggplot2::layer(
    stat = StatCounty,  # proto object from step 2
    geom = ggplot2::GeomSf,  # inherit other behavior
    data = data,
    mapping = mapping,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

step 4: test geom_* function, celebrate!

read_csv("https://wilkelab.org/SDS375/datasets/US_census.csv",
                      col_types = cols(FIPS = "c")) %>%  
  ggplot() + 
  aes(fips = FIPS) +
  geom_sf_county(linewidth = .02, 
                      color = "darkgrey") +
  aes(fill = mean_work_travel) + 
  coord_sf() +
  scale_fill_viridis_c(option = "magma")

## Joining with `by = join_by(fips)`

Now you, create geom_sf_state

We give you step 0

library(sf)

my_states_df <- tibble(state.name, row = 1:50)

# https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
states_geometries <- read_sf("cb_2018_us_state_20m/cb_2018_us_state_20m.shp") %>%   
  filter(NAME != "Puerto Rico",
         NAME != "District of Colombia") %>% 
  select(STUSPS, NAME, geometry)


my_states_df %>% 
  rename(NAME = state.name) %>% 
  left_join(states_geometries) %>% 
ggplot() + 
  geom_sf(aes(geometry = geometry, state = state.name)) + 
  coord_sf()

## Joining with `by = join_by(NAME)`

## Warning in layer_sf(geom = GeomSf, data = data, mapping = mapping, stat = stat,
## : Ignoring unknown aesthetics: state

Step 1.

layer_data(last_plot()) %>% 
  select(state, geometry, xmin, xmax, ymin, ymax) ->
states_reference

compute_panel_states <- function(data, scales){
  
  data %>% 
    inner_join(states_reference, multiple = "all") %>% 
    mutate(group = -1)

  
}


## Step 2. 
StatState <- ggplot2::ggproto(`_class` = "StatState",
                                  `_inherit` = ggplot2::Stat,
                                  required_aes = c("state"),
                                  # setup_data = my_setup_data,
                                  compute_panel = compute_panel_states,
                                  default_aes = aes(geometry = after_stat(geometry))
                                  )

## Step 3.
geom_sf_state <- function(
  mapping = NULL,
  data = NULL,
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE, ...) {
  ggplot2::layer(
    stat = StatState,  # proto object from step 2
    geom = ggplot2::GeomSf,  # inherit other behavior
    data = data,
    mapping = mapping,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

my_states_df %>% 
  mutate(state.name = as.factor(state.name)) %>% 
  ggplot() +
  aes(state = state.name) %>% 
  geom_sf_state() + 
  aes(fill = row) +
  coord_sf()

## Joining with `by = join_by(state)`

more easy geom recipes: compute panel

Evangeline Reynolds

3/16/2023

Step 0: use base ggplot2 to get the job done

Step 1: computation

Step 2: define ggproto

Step 3: define geom_* function

Step 4: Enjoy! Use your function

Explain: What will the in-facet behavior be?

Now you

Recipe #2: circle pack

Step 0: Use base ggplot2 to get the job done

Step 1: computation

Step 1.b Test computation

Step 2: define ggproto

Step 3: define geom_* function

Step 4: Enjoy! Use your function

Now you… Create a layer that labels the packed circles…

Recipe #3. sf geometries

Step 0: get it done with base ggplot2

Example simple feature to data frame…

Step 1: computation

Step 2: pass to ggproto

Step 3: write geom_* function

step 4: test geom_* function, celebrate!

Now you, create geom_sf_state