library(tidyverse)
library(palmerpenguins)
<- remove_missing(penguins)
penguins
<- penguins %>%
penguins_medians summarize(bill_length_mm_median = median(bill_length_mm),
bill_depth_mm_median = median(bill_depth_mm))
%>%
penguins ggplot() +
aes(x = bill_depth_mm) +
aes(y = bill_length_mm) +
geom_point() +
geom_point(data = penguins_medians,
color = "red", size = 4,
aes(x = bill_depth_mm_median,
y = bill_length_mm_median))
easy geom recipes
Using ggplot2 has been described as writing ‘graphical poems’. But we may feel at a loss for ‘words’ when functions we’d like to have don’t exist. The ggplot2 extension system allows us to build new ‘vocabulary’ for fluent expression.
An exciting extension mechanism is that of inheriting from existing, more primitive geoms after performing some calculation.
To get your feet wet in this world and give you a taste of patterns for geom extension, we provide three basic examples of the geoms_ that inherit from existing geoms (point, text, segment, etc) along with a practice exercise. With such geoms, calculation is done under the hood by the ggplot2 system.
With these geom, you can write new graphical poems with exciting new graphical ‘words’!
This tutorial is intended for individuals who already have a working knowledge of the grammar of ggplot2, but may like to build a richer vocabulary for themselves.
Preview
Our recipes take the form:
- Step 0. Get the job done with ‘base’ ggplot2. It’s a good idea to clarify what needs to happen without getting into the extension architecture
- Step 1. Write a computation function. Wrap the necessary computation into a function that your target geom_*() function will perform. We focus on ‘compute_group’ computation only in this tutorial.
- Step 2. Define a ggproto object. ggproto objects allow your extension to work together with base ggplot2 functions! You’ll use the computation function from step 1 to help define it.
- Step 3. Write your geom function! You’re ready to write your function. You will incorporate the ggproto from step 2 and also define which more primitive geom (point, text, segment etc) you want other behaviors to inherit from.
- Step 4. Test/Enjoy! Take your new geom for a spin! Check out group-wise computation behavior!
Below, you’ll see a completely worked example (example recipes) and then a invitation to build a related target geom_*().
Example recipe #1: geom_point_xy_medians()
–
- This will be a point at the median of x and y
Step 0: use base ggplot2 to get the job done
Step 1: computation
- define computation that ggplot2 should do for you, before plotting
- here it’s computing a variable with labels for each observation
- test that functionality Step 1.b
# Step 1.a
<- function(data, scales){ # scales is used internally in ggplot2
compute_group_xy_medians %>%
data summarize(x = median(x),
y = median(y))
}
# Step 1.b
%>%
penguins rename(x = bill_depth_mm, # ggplot2 will work with 'aes' column names
y = bill_length_mm) %>% # therefore rename is required to used the compute function
compute_group_xy_medians()
# A tibble: 1 × 2
x y
<dbl> <dbl>
1 17.3 44.5
Step 2: define ggproto
Things to notice
- what’s the naming convention for the proto object?
- which aesthetics are required as inputs?
- where does the function from above go?
<- ggplot2::ggproto(`_class` = "StatXYMedians",
StatXYMedians `_inherit` = ggplot2::Stat,
required_aes = c("x", "y"),
compute_group = compute_group_xy_medians)
Step 3: define geom_* function
Things to notice
- Where does our work up to this point enter in?
- What more primitive geom will we inherit behavior from?
<- function(mapping = NULL, data = NULL,
geom_point_xy_medians position = "identity", na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
::layer(
ggplot2stat = StatXYMedians, # proto object from step 2
geom = ggplot2::GeomPoint, # inherit other behavior
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
) }
Step 4: Enjoy! Use your function
%>%
penguins ggplot()+
aes(x = bill_depth_mm, y = bill_length_mm)+
geom_point()+
geom_point_xy_medians(color = "red")
And check out conditionality!
%>%
penguins ggplot()+
aes(x = bill_depth_mm,
y = bill_length_mm,
color = species)+
geom_point()+
geom_point_xy_medians(size = 4)
Task #1: create the function geom_point_xy_means()
Using recipe #1 as a reference, try to create the function geom_point_xy_means()
# step 0: use base ggplot2
# step 1: write your compute_group function (and test)
# step 2: write ggproto with compute_group as an input
# step 3: write your geom_*() function with ggproto as an input
# step 4: enjoy!
Example recipe #2: geom_label_id()
Step 0: use base ggplot2 to get the job done
%>%
cars mutate(id_number = 1:n()) %>%
ggplot() +
aes(x = speed, y = dist) +
geom_point() +
geom_label(aes(label = id_number),
hjust = 1.2)
Step 1: computation
# you won't use the scales argument, but ggplot will later
<- function(data, scales){
compute_group_row_number
%>%
data # add an additional column called label
# the geom we inherit from requires the label aesthetic
mutate(label = 1:n())
}
# step 1b test the computation function
%>%
cars # input must have required aesthetic inputs as columns
rename(x = speed, y = dist) %>%
compute_group_row_number() %>%
head()
x y label
1 4 2 1
2 4 10 2
3 7 4 3
4 7 22 4
5 8 16 5
6 9 10 6
Step 2: define ggproto
<- ggplot2::ggproto(`_class` = "StatRownumber",
StatRownumber `_inherit` = ggplot2::Stat,
required_aes = c("x", "y"),
compute_group = compute_group_row_number)
Step 3: define geom_* function
- define the stat and geom for your layer
<- function(mapping = NULL, data = NULL,
geom_label_row_number position = "identity", na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
::layer(
ggplot2stat = StatRownumber, # proto object from Step 2
geom = ggplot2::GeomLabel, # inherit other behavior
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
) }
Step 4: Enjoy! Use your function
%>%
cars ggplot() +
aes(x = speed, y = dist) +
geom_point() +
geom_label_row_number(hjust = 1.2) # function in action
And check out conditionality!
last_plot() +
aes(color = dist > 60) # Computation is within group
Task #2: create geom_text_coordinates()
Using recipe #2 as a reference, can you create the function geom_text_coordinates()
.
–
- geom should label point with its coordinates ‘(x, y)’
- geom should have behavior of geom_text (not geom_label)
Hint:
paste0("(", 1, ", ",3., ")")
[1] "(1, 3)"
# step 0: use base ggplot2
# step 1: write your compute_group function (and test)
# step 2: write ggproto with compute_group as an input
# step 3: write your geom_*() function with ggproto as an input
# step 4: enjoy!
Example recipe #3: geom_point_lm_fitted()
Step 0: use base ggplot2 to get the job done
<- lm(formula = bill_length_mm ~ bill_depth_mm,
model data = penguins)
<- penguins %>%
penguins_w_fitted mutate(fitted = model$fitted.values)
%>%
penguins ggplot() +
aes(x = bill_depth_mm, y = bill_length_mm) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_point(data = penguins_w_fitted,
aes(y = fitted),
color = "blue")
Step 1: computation
<- function(data, scales){
compute_group_lm_fitted<-lm(formula= y ~ x, data = data)
model%>%
data mutate(y=model$fitted.values)
}
# test out the function
%>%
penguins # rename to explicitly state the x and y inputs
rename(x = bill_depth_mm, y = bill_length_mm)%>%
compute_group_lm_fitted()
# A tibble: 333 × 8
species island y x flipper_length_mm body_mass_g sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 43.0 18.7 181 3750 male 2007
2 Adelie Torgersen 43.8 17.4 186 3800 female 2007
3 Adelie Torgersen 43.5 18 195 3250 female 2007
4 Adelie Torgersen 42.6 19.3 193 3450 female 2007
5 Adelie Torgersen 41.8 20.6 190 3650 male 2007
6 Adelie Torgersen 43.6 17.8 181 3625 female 2007
7 Adelie Torgersen 42.4 19.6 195 4675 male 2007
8 Adelie Torgersen 43.7 17.6 182 3200 female 2007
9 Adelie Torgersen 41.4 21.2 191 3800 male 2007
10 Adelie Torgersen 41.5 21.1 198 4400 male 2007
# ℹ 323 more rows
Step 2: define ggproto
<-ggplot2::ggproto(`_class` = "StatLmFitted",
StatLmFitted`_inherit` = ggplot2::Stat,
required_aes = c("x", "y"),
compute_group = compute_group_lm_fitted)
Step 3: define geom_* function
<- function(mapping = NULL, data = NULL,
geom_point_lm_fitted position = "identity", na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
::layer(
ggplot2stat = StatLmFitted, # proto object from step 2
geom = ggplot2::GeomPoint, # inherit other behavior
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
) }
Step 4: Enjoy! Use your function
%>%
penguins ggplot() +
aes(x = bill_depth_mm, y = bill_length_mm) +
geom_point() +
geom_smooth(method="lm", se= F)+
geom_point_lm_fitted(color="blue")
And check out conditionality
%>%
penguins ggplot() +
aes(x = bill_depth_mm, y = bill_length_mm) +
geom_point() +
geom_smooth(method="lm", se= F) +
geom_point_lm_fitted() +
facet_wrap(facets = vars(species))
Task #3 create geom_segment_lm_residuals()
Create the function geom_segment_lm_residuals()
.
Hint: consider what aesthetics are required for segments. We’ll give you Step 0 this time…
Step 0: use base ggplot2 to get the job done
# step 0: use base ggplot2
<- lm(formula = bill_length_mm ~ bill_depth_mm,
model data = penguins)
<- penguins %>%
penguins_w_fitted mutate(fitted = model$fitted.values)
%>%
penguins ggplot() +
aes(x = bill_depth_mm, y = bill_length_mm) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_segment(data = penguins_w_fitted,
aes(yend = fitted, xend = bill_depth_mm),
color = "blue")
# step 1: write your compute_group function (and test)
# step 2: write ggproto with compute_group as an input
# step 3: write your geom_*() function with ggproto as an input
# step 4: enjoy!
Not interested in writing your own geoms?
Check out some ready-to-go geoms that might be of interest in the ggxmean package… or other extension packages.
Interested in working a bit more with geoms and making them available to more folks, but not interested in writing your own package?
Join in on the development and validation of the ggxmean package for statistical educators and everyday analysis!