Recipe 3: geom_residuals()

Author

Gina Reynolds, Morgan Brown

Published

January 3, 2022

This tutorial is intended for individuals who already have a working knowledge of the grammar of ggplot2, but may like to build a richer vocabulary for themselves via the Stat extension mechanism.

Preview

Our recipes take the form:

  • Step 0. Get the job done with β€˜base’ ggplot2. It’s a good idea to clarify what needs to happen without getting into the extension architecture
  • Step 1. Write a computation function. Wrap the necessary computation into a function that your target geom_*() function will perform. We focus on β€˜compute_group’ computation only in this tutorial.
  • Step 2. Define a ggproto object. ggproto objects allow your extension to work together with base ggplot2 functions! You’ll use the computation function from step 1 to help define it.
  • Step 3. Write your geom function! You’re ready to write your function. You will incorporate the ggproto from step 2 and also define which more primitive geom (point, text, segment etc) you want other behaviors to inherit from.
  • Step 4. Test/Enjoy! Take your new geom for a spin! Check out group-wise computation behavior!

Below, you’ll see a completely worked example (example recipes) and then a invitation to build a related target geom_*().


Example recipe #3: geom_point_lm_fitted()


Step 0: use base ggplot2 to get the job done

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
βœ” dplyr     1.1.4     βœ” readr     2.1.5
βœ” forcats   1.0.0     βœ” stringr   1.5.1
βœ” ggplot2   3.5.1     βœ” tibble    3.2.1
βœ” lubridate 1.9.3     βœ” tidyr     1.3.1
βœ” purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
βœ– dplyr::filter() masks stats::filter()
βœ– dplyr::lag()    masks stats::lag()
β„Ή Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
penguins <- remove_missing(palmerpenguins::penguins)
Warning: Removed 11 rows containing missing values or values outside the scale
range.
model <- lm(formula = bill_length_mm ~ bill_depth_mm, 
            data = penguins) 

penguins_w_fitted <- penguins %>% 
  mutate(fitted = model$fitted.values)

penguins %>% 
  ggplot() + 
  aes(x = bill_depth_mm, y = bill_length_mm) +
  geom_point() + 
  geom_smooth(method = "lm", se = F) + 
  geom_point(data = penguins_w_fitted,
             aes(y = fitted),
             color = "blue")
`geom_smooth()` using formula = 'y ~ x'

Step 1: computation

compute_group_lm_fitted<- function(data, scales){
  model<-lm(formula= y ~ x, data = data)
  data %>% 
    mutate(y=model$fitted.values)
}

# test out the function
penguins %>%
  # rename to explicitly state the x and y inputs
  rename(x = bill_depth_mm, y = bill_length_mm)%>%
  compute_group_lm_fitted()
# A tibble: 333 Γ— 8
   species island        y     x flipper_length_mm body_mass_g sex     year
   <fct>   <fct>     <dbl> <dbl>             <int>       <int> <fct>  <int>
 1 Adelie  Torgersen  43.0  18.7               181        3750 male    2007
 2 Adelie  Torgersen  43.8  17.4               186        3800 female  2007
 3 Adelie  Torgersen  43.5  18                 195        3250 female  2007
 4 Adelie  Torgersen  42.6  19.3               193        3450 female  2007
 5 Adelie  Torgersen  41.8  20.6               190        3650 male    2007
 6 Adelie  Torgersen  43.6  17.8               181        3625 female  2007
 7 Adelie  Torgersen  42.4  19.6               195        4675 male    2007
 8 Adelie  Torgersen  43.7  17.6               182        3200 female  2007
 9 Adelie  Torgersen  41.4  21.2               191        3800 male    2007
10 Adelie  Torgersen  41.5  21.1               198        4400 male    2007
# β„Ή 323 more rows

Step 2: define ggproto

StatLmFitted<-ggplot2::ggproto(`_class` = "StatLmFitted",
                                  `_inherit` = ggplot2::Stat,
                                  required_aes = c("x", "y"),
                                  compute_group = compute_group_lm_fitted)

# test 
penguins %>% 
  ggplot() + 
  aes(x = bill_depth_mm, y = bill_length_mm) +
  geom_point() + 
  geom_smooth(method = "lm", se = F) + 
  geom_point(stat = StatLmFitted, color = "blue")
`geom_smooth()` using formula = 'y ~ x'

Step 3: define geom_* function

geom_point_lm_fitted <- function(mapping = NULL, data = NULL,
                           position = "identity", na.rm = FALSE,
                           show.legend = NA,
                           inherit.aes = TRUE, ...) {
  ggplot2::layer(
    stat = StatLmFitted, # proto object from step 2
    geom = ggplot2::GeomPoint, # inherit other behavior
    data = data, 
    mapping = mapping,
    position = position, 
    show.legend = show.legend, 
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

Step 4: Enjoy! Use your function

penguins %>% 
  ggplot() + 
  aes(x = bill_depth_mm, y = bill_length_mm) +
  geom_point() + 
  geom_smooth(method="lm", se= F)+
  geom_point_lm_fitted(color="blue")
`geom_smooth()` using formula = 'y ~ x'

And check out conditionality

penguins %>% 
  ggplot() + 
  aes(x = bill_depth_mm, y = bill_length_mm) +
  geom_point() + 
  geom_smooth(method="lm", se= F) +
  geom_point_lm_fitted() + 
  facet_wrap(facets = vars(species))
`geom_smooth()` using formula = 'y ~ x'


Task #3 create geom_segment_lm_residuals()

Create the function geom_segment_lm_residuals().

Hint: consider what aesthetics are required for segments. We’ll give you Step 0 this time…

Step 0: use base ggplot2 to get the job done


Not interested in writing your own geoms?

Check out some ready-to-go geoms that might be of interest in the ggxmean package… or other extension packages.

Interested in working a bit more with geoms and making them available to more folks, but not interested in writing your own package?

Join in on the development and validation of the ggxmean package for statistical educators and everyday analysis!