This ggplot2 Extension Cookbook aims to provide ggplot2 some extension strategies in a consistent and accessible way. The target audience is fluent ggplot2 and R users who have not yet entered the extension space. The main tactic is to provide many extensions examples for building familiarity and confidence, and also which might serve as specific reference when readers are inspired to build their own extensions.
In that material, Iâll try to stick to an enumerated formula to orient you to the ggplot2 extension, so even if a few details seem confusing, youâll know âwhereâ you are at a higher level:
We group the content by extension type, provide demonstrations of their use. Right now, there is a lot of focuses on new geom_* and stat_* layer functions. I think this is an important area to illuminate because many of these allow us to pass off routine computational task to the plotting system. This importance translates to excitement about ggplot2 extension packages: new geom_* layers functions really rule the day when it comes to outside interest. See for example â5 powerful ggplot2 extensionsâ, Rapp 2024 in which four of the five focus on new geoms that are made available by packages and âFavorite ggplot2 extensionsâ, Scherer 2021 in which almost all of the highlighted extensions are geom_* and stat_* user-facing functions.
Regarding focus on stat_âs versus geom_âs functions, I take a geom_* -first approach, because these functions are more commonly used in laymanâs ggplot builds. I suspect we find geom_* functions to be more concrete descriptions of what the creator envisions for her plot, whereas stat_* function names may feel a be more âadverbialâ and nebulous in their description of rendered output. Consider that ggplot(mtcars, aes(wt, mpg)) + stat_identity() and ggplot(mtcars, aes(wt, mpg)) + geom_point() create identical plots, but later seems much more descriptive of the resultant plot. Between these two particular options, the preference for the geom_point is evident in the user data; on Github, there are 788 R language files containing âstat_identityâ whereas a staggering 261-thousand R language files contain âgeom_pointâ. Of course, stat_* constructions are quite flexible and expressive, and more seasoned ggplot2 users use them with great fluency, and therefore the topic is covered.
Finally, most of the code is at the âR for Data Scienceâ level, and not âAdvanced Râ level, which I hope will afford greater reach. While object oriented programming (OOP) gets top billing in many extension materials, but many folks that are expert users of ggplot2 might not know much about OOP. I see what can be accomplished with little emphasis on OOP and ggroto.
Reader, I do think it is important for you to recognize that ggplot2 objects (i.e. p in p <- ggplot(cars, aes(speed, dist)) + geom_point()) are not, of course the rendered plot, but rather a plot specification (of global data, aesthetic mapping, etc) that result from the declarations the user has made. But I think youâve probably made this realization very early on in your ggplot2 journey already. You know that the ggplot plot building syntax allows users to make changes to the overall plot specification incrementally. In other words the +
operator allows us to modify the the ggplot2 object. And the ggroto system allows changes to the ggplot2 specification from outside the ggplot2 package too â from extension packages.
For those who have dipped your toes into extension, the composition of the extension elements will look different from what you will see in the wild. Specifically, I try to define ggproto objects in as concise and high-level a way as possible â and as close to ignorable for those put off or nervous about OOP methods.
For example defining the object StatCoordinate looks like this:
StatCoordinate <- ggplot2::ggproto(
`_class` = "StatCoordinate",
`_inherit` = ggplot2::Stat,
required_aes = c("x", "y"),
compute_group = compute_group_coordinates
)
Currently, with the geom_* and stat_* layers, Iâm experimenting with a ratio typology that youâll see in the section titles. The idea is to think about how the input data relates to the mark we see on the plot and in turn how the markâs information is stored in the ggplot2 object. This is a brand new undertaking, and Iâm unsure of how productive or precise it will be.
Overall, I think the resources in this ggplot2 extension cookbook are aligned with the findings in â10 Things Software Developers Should Learn about Learningâ, especially the observation that new techniques and ideas are often best internalized when first applied to concrete examples; general principles may be more grounded if situated in relevant examples.
In January 2020, I attended Thomas Lin Pedersonâs talk âExtending your ability to extend ggplot2â seated on the floor of a packed out ballroom. The talk had the important central message - âyou can be a ggplot2 extenderâ. And since then, I wanted to be in that cool-kid extender club. Four years later, Iâm at a point where I can start claiming that club membership. I hope that this ggplot2 Extension Cookbook will help along you on your extender journey and, especially if you are fluent in R and ggplot2, it says to you âyou can be a ggplot2 extenderâ.
I became a regular ggplot2 user in 2017. I loved how, in general, the syntax was just a simple expression of the core Grammar of Graphics conception of a âstatistical graphicâ (i.e. data visualization).
A data visualization displays
You can learn so much about data via a simple 3-2-1 â data-mapping-mark â ggplot2 utterance. And further modifications could be made bit-by-bit, too, to completely tailor the plot to the creatorâs visual personal preferences.
All of this closely resembles to how you might sketch out a plot on a notepad or blackboard, or describe your data representation decisions to yourself or a colleague. As Thomas Lin Pederson has said, âggplot2 lets you speak your plot into existenceâ. And perhaps a little less eloquently by Hadley Wickhamâs, the ggplot2 author, âThis is what Iâm thinking; your the computer, now go and do it!â, a paraphrase of the author talking about how he thought data viz should feel as a graduate student statistical consultant â before ggplot2 existed.
But there were admittedly pain points when using âbaseâ ggplot2; for me, this was mostly when a geom_* function didnât exist for doing some compute in the background, and I would need such compute done over and over. It would be a slough to work on the compute for a bunch of subsets of the data upstream to the plotting environment. This pre-computation problem felt manageable in an academic and classroom setting that I found myself in through early in my data career but when I moved to a primarily analytic role (West Point, Fall 2020) â where the volume of analysis was simply higher and turn around times faster â I felt the problem much more acutely. (Overnight, I went from weak preference for geom_col - to strong preference for geom_bar!) Extension seemed to offer the solution to the problem, and I was more motivated than ever to figure extension out in my analyst role.
I experienced about a year of failure and frustration when first entering the extension space. If I werenât so convinced of the efficiency gains that it would eventually yield and the elegance of extension, Iâd likely have given up. Looking back and recognizing the substantial hurdles for even long time R and ggplot2 users, as I was, I think there is space for more ggplot2 extension reference materials like this ggplot2 Extension Cookbook.
Iâm grateful for several experiences and the efforts of others that have refined my thinking about what will work for newcomers to extension First, after just getting my own feet wet in extension, I had the chance to work on extension with underclassmen, undergraduate students in the context of their independent studies. Our focus was the same type of extension that Pederson demonstrated â a geom_* function that used a Stat to do some computational work, and then inherit the rest of its behavior from a more primitive geom.
Working with new-to-R students gave me a chance to reflect on my fledgling workflow and reformulate it; how would we build up skills and ideas in way that would be accessible to very new R and ggplot2 users. What would these students â veterans of just one or two stats classes that used R and ggplot2, find familiar and accessible? What might we be able to de-emphasize? ggproto and object oriented programming hadnât been touched in coursework. Could we still still succeed with extension?
The following steps emerged:
Taking new R users into the extension space was a leap of faith. But I was very impressed with what the students were able to accomplish during a single semester.
And I also wondered how the strategy would perform with experienced R and ggplot2 users. Curious, I created a tutorial [with assistance from independent study student Morgan Brown, who continued to work with me for a second semester] called âEasy-Geom-Recipesâ formally got feedback on it via focus groups and a survey, after refining the tutorial, with a small group of stats educator which we might term ggplot and R super users given their frequency and length of use of these materials.
Among my favorite quotation from the focus groups is something that validated the efforts but also challenged me:
it was ⌠easy. And I felt empowered as a result of thatâŚ. But you know, like, my problem isnât gonna be that easy.
To that participant, Iâd say âSometimes it is that easyâ. But he is right, that often times I come to an extension problem and am surprised that the strategy that I think is going to work doesnât, or at least not without a little fiddling.
The feedback on the easy-geom-recipes was collected in March 2023. I presented on the outcomes at the ASA Chapter meeting of COWY, âA New Wave of ggplot2 Extendersâ.
After presenting on the success of âeasy geom recipesâ, I felt I was at a crossroads. I could either focus on packaging my material as educational, or I could actually write extensions in R packages. The later felt a little more true to my interests, but I felt torn. Happily, I ended up landing a solution where I could have it both ways: writing packages that preserve the story and create recipes along the way. This was enabled by a literate programming mindset generally, and specifically thinly wrapping knitr::knitr_code$get() in my own helper package {readme2pkg}; the functions in {readme2pkg} send code chunks to the appropriate directories for packaging, but live in the README.Rmd as part of the development narrative. (see to {litr} as an alternative to {readme2pkg}). Iâm returning to to squarely focus on education in creating this ggplot2 extension cookbook. It has been very easy to pull in material from those packages given their adherence a specific narrative form. In mocking up this book, Iâm using code chunk options like child = '../my_gg_experiments/my_extension_ideas.'
and code = '../ggwaterfall/R/geom_waterfall'.
It is a great help not to have to pull up files and copy and paste. Iâm very grateful to Yihui Xie for his insights and efforts at making this possible.
Iâm personally grateful to other ggplot2 extenders and R enthusiasts that have supported this journey.
Iâm also grateful to the ggplot2 development team .
Iâm also indebted to my Department of Mathematics and Dean Data Cell colleagues at West Point, for sitting through some talks (some extemporaneous and muddled) where I tried to articulate my ggplot2 extension dreams.
Finally, to Winston Chang, who gets top billing in the ggplot2 extension vignette along with your ggproto, I hope you wonât mind the general approach here which experiments with making ggproto as ignorable as possible for OOP noobs. I also hope to meet you someday and hear more about the early days of ggproto, maybe at ggplot2 extenders meetup as a special guest, perhaps January 2025.
And finally, finally to Hadley Wickham and Leland Wilkinson having incredible insights and acting on them.
For best results, Iâd recommend diving in by actually creating some geoms as prompted in the âeasy geom recipesâ tutorial using the rendered tutorial or text .Rmd file. The âeasy recipesâ contain 3 fully worked examples, and 3 exercises that extend the lessons in the examples.
Having completed these exercises, youâll have lived geom creations from start to finish, will be well oriented to the consistent patterns I use, to the extent possible, throughout the cookbook.
This section tackles creating new geom_* layers. The strategy is to look at compute that youâd do without extension capabilities (Step 0), and then create a Stat for that (Step 1 & 2), and then compose a user-facing function, which inherits other behavior from a more primitive geom (Step 3), so that ggplot2 can do compute for you in the background (Step 4).
The section is called easy geoms because these geom functions actually inherit much behavior from more primitive geoms like col, text, point, segment, rect, etc..
The proposed function geom_text_coordinate() is one where a label for an x and y position is automatically computed. The target user interface will only require x and y to be specified as aesthetics, and will look something like this. Whereas the geom_text() function would require a label aesthetic, geom_text_coordinate will compute this label variable within the function call.
ggplot(data = cars) +
aes(x = speed, y = dist) +
geom_point() +
geom_text_coordinate()
Weâll be computing a â1:1:1â type layer, which means that for each row of the input dataframe, a single mark will be drawn, and the internal data frame that ggplot2 will render with will use a single row per mark.
Weâll always start with a âstep 0â. The groundwork and knowledge that I assume you have is to build this plot without extending ggplot2. The computation that you do yourself will serve as useful reference for step 1 of the extension process. Ultimately, we would like a ggplot2 function to do the compute in the background for us.
library(tidyverse)
library(ggxmean)
cars |>
mutate(coords =
paste0("(", speed, ",", dist, ")")) |>
ggplot() +
aes(x = speed, y = dist) +
geom_point() +
geom_text(aes(label = coords),
check_overlap = T,
hjust = 0,
vjust = 0)
Next, we turn to writing this compute in a way that ggplot2 layer functions can use.
Compute functions will capture the compute that our a user-facing function will ultimately do for us in a plot build. Arguments that are required for ggplot2 to use the function in its preparation are both data
and scales
. For now, we donât need worry more about the scales argument.
The data that serves as input can be assumed to contain columns with certain variable names â the required aesthetics that weâll see declared in the next step. For the function that weâre building, the required aesthetics will be âxâ and âyâ. In the compute_group_coordinates()
function, therefore, the mutate step is possible because the data will have variables named x and y. In the mutate step, we are creating a variable that ggplot2 understands internally, label
.
compute_group_coordinates <- function(data, scales) {
# data will contain variables 'x' and 'y', because of required aes
data |>
mutate(label =
paste0("(", x, ", ", y, ")"))
}
Before we move on, itâs a good idea to check out that our function is working on its own. To use the function, remember that we need a dataframe with the expected variables, x
and y
. We can test the function with the cars dataset, but first we modify the data (that has variable names speed
and dist
) with the rename function.
cars |>
rename(x = speed, y = dist) |> # rename allows us to test function
compute_group_coordinates() |>
head()
x | y | label |
---|---|---|
4 | 2 | (4, 2) |
4 | 10 | (4, 10) |
7 | 4 | (7, 4) |
7 | 22 | (7, 22) |
8 | 16 | (8, 16) |
9 | 10 | (9, 10) |
The next step toward our user-facing function is to create a new Stat, which is a ggproto object. Fortunately, this is a subclass of the ggplot2::Stat Class, and we will inherit much behavior from that class. This means that definition of our class StatCoordinate
, is quite straightforward. For our target function, beyond creating the new class and declaring the inheritance, weâll need to 1) specify the required aesthetics and 2) pass our compute function to a compute slot. The slot weâre using for our coordinates case is compute_group. Therefore, the compute will be done by group if any discrete variable (non-numeric) is mapped from the data. The consequences of using the compute_group slot (verse other slots) will become more important in future examples. Returning to the topic of required_aes, the coordinates label can always be created from x and y as an input, and we know that our compute function uses both variables named âxâ and âyâ in itâs computation.
StatCoordinate <- ggplot2::ggproto(
`_class` = "StatCoordinate",
`_inherit` = ggplot2::Stat,
required_aes = c("x", "y"),
compute_group = compute_group_coordinates
)
In Step 3, weâre close to our goal of a user-facing function for familiar ggplot2 builds.
Under the hood, weâll pass our new Stat, StatCoordinate, to ggplot2âs layer()
function. ggplot2::layer()
is may not be familiar, but it can be used directly in ggplot() pipelines. We pass our StatCoordinate ggproto object to the stat argument, handling the computation (adding a column of data containing coordinates and called âlabelâ). Additionally the ggplot2::GeomText object to the geom argument. The âgeometryâ or âmarkâ on the plot therefore will be of the âtextâ type.
# part 3.0 use ggplot2::layer which requires specifying Geom and Stat
ggplot(data = cars) +
aes(x = speed, y = dist) +
geom_point() +
ggplot2::layer(
stat = StatCoordinate,
geom = ggplot2::GeomText,
position = "identity"
)
You are probably more familiar with geom_*()
and stat_*
functions which wrap the ggplot2::layer() function; these generally have a fixed geom or stat. In create geom_text_coordinate()
, because the use-scope is so narrow, both the stat and geom are âhard-codedâ in the layer; i.e. stat and geom are not arguments in the geom_* function. Hereâs how we specify our geom_text_coordinate()
:
# part b. create geom_* user-facing function using g
geom_text_coordinate <- function(mapping = NULL,
data = NULL,
position = "identity",
show.legend = NA,
inherit.aes = TRUE,
na.rm = FALSE,
...) {
ggplot2::layer(
stat = StatCoordinate,
geom = ggplot2::GeomText,
position = position,
mapping = mapping,
data = data,
inherit.aes = inherit.aes,
show.legend = show.legend,
params = list(na.rm = na.rm, ...)
)
}
You will see a few more arguments in play here: mapping
, data
, position
, show.legend
, etc.. We do anticipate that the user might want to have control over the data and aesthetic mapping specific to layer (rather than deriving them from global declarations), and therefore make the mapping and data arguments available. Furthermore, the position, show.legend, inherit.aes, and na.rm arguments are made available in the geom as shown below. The ellipsis allows you to leverage even more functionality. In sum, this makes geom_text_coordinate()
work very much like geom_text()
â you can use all the same arguments youâd use with geom_text() â except that the label aesthetic is computed under the hood, and vanilla geom_text()
requires you to specify the label aesthetic. For example, you can use the argument check_overlap
in geom_text_coordinate()
, as you might do in geom_text()
.
Good news, weâr at Step 4! You created a function for use in a ggplot2 pipeline and now you can use it! Remember, you can basically use geom_text_coordinate in the same was as geom_text, because the geom argument in the layer() function is geom = GeomText, so arguments like check_overlap that are usable in geom_text will be meaningful our new function! The difference, of course, is that the label aesthetic is computed for you â so you donât need that aesthetic which would be required for the vanilla geom_text() function.
ggplot(data = cars) +
aes(x = speed, y = dist) +
geom_point() +
geom_text_coordinate()
last_plot() +
aes(color = speed > 15)
last_plot() +
geom_text_coordinate(check_overlap = T,
color = "black")
The next proposed function weâll take on is geom_post(). We can use this function where we are interested in the magnitude of y, not just relative positions of y. Given that we are interested in the magnitude of y weâd like a geom that extends from the value of y to y equal to zero, i.e. a âpostâ geom. You can use a geom_segment for this purpose in base ggplot2 as seen in Step 0. However, youâll notice that the xend and yend, which are aesthetics that geom_segment requires, could automatically be derived given the requirements of drawing a post. Therefore, to simplify your future plot compositions, you may want to define an extension function, geom_post().
probs_df = data.frame(outcome = 0:1,
prob = c(.7, .3))
probs_df
outcome | prob |
---|---|
0 | 0.7 |
1 | 0.3 |
ggplot(data = probs_df) +
aes(x = outcome, y = prob, yend = 0, xend = outcome) +
geom_point() +
geom_segment()
compute_group_post <- function(data, scales){
data |>
dplyr::mutate(xend = x) |>
dplyr::mutate(yend = 0)
}
probs_df |>
rename(x = outcome, y = prob) |>
compute_group_post()
x | y | xend | yend |
---|---|---|---|
0 | 0.7 | 0 | 0 |
1 | 0.3 | 1 | 0 |
StatPost <- ggplot2::ggproto("StatPost",
ggplot2::Stat,
compute_group = compute_group_post,
required_aes = c("x", "y")
)
geom_post <- function(mapping = NULL,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatPost,
geom = ggplot2::GeomSegment,
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
ggplot(data = probs_df) +
aes(x = outcome, y = prob) +
geom_post()
geom_lollipop <- function(...){
list(geom_post(...),
geom_point(...))
}
ggplot(probs_df) +
aes(x = outcome, y = prob) +
geom_lollipop(color = "magenta")
many rows from a dataset: will be summarized and visualized by as single mark: the mark will be defined by one row of data
mtcar_xy_means <- mtcars |>
summarize(wt_mean = mean(wt),
mpg_mean = mean(mpg))
ggplot(mtcars) +
aes(x = wt, y = mpg) +
geom_point() +
geom_point(data = mtcar_xy_means,
aes(x = wt_mean, y = mpg_mean),
size = 8)
compute_group_means <- function(data, scales){
data |>
summarise(x = mean(x),
y = mean(y))
}
StatXymean <- ggplot2::ggproto("StatXymean",
ggplot2::Stat,
compute_group = compute_group_means,
required_aes = c("x", "y")
)
geom_xy_means <- function(mapping = NULL,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatXymean,
geom = ggplot2::GeomPoint,
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
ggplot(mtcars) +
aes(x = wt, y = mpg) +
geom_point() +
geom_xy_means(size = 8)
last_plot() +
aes(color = am == 1)
This example uses the chull function in R, which âcomputes the subset of points which lie on the convex hull of the set of points specified.â In laymanâs terms if you had a bunch of nails hammered into a board and put a rubber-band around them, the convex hull would be defined by the subset of nails touching the rubberband.
Iâm especially excited to include this example, reworked using the Step 0-4 approach, because ultimately looking at the ggplot2 extension vignette on stat_chull and geom_chull was the beginning of layer extension unlocking for me. https://ggplot2.tidyverse.org/articles/extending-ggplot2.html#creating-a-new-stat
library(tidyverse)
chull_row_ids <- chull(mtcars$wt, mtcars$mpg)
chull_row_ids
#> [1] 17 16 15 24 7 29 21 3 28 20 18
mtcars_chull_subset <- mtcars |> slice(chull_row_ids)
ggplot(mtcars) +
aes(x = wt, y = mpg) +
geom_point() +
geom_polygon(data = mtcars_chull_subset,
alpha = .3,
color = "black")
# Step 1
compute_group_c_hull <- function(data, scales){
chull_row_ids <- chull(data$x, data$y)
data |> slice(chull_row_ids)
}
Below, we see that the dataset is reduced to 11 rows which constitute the convex hull perimeter.
mtcars |> # 32 rows
rename(x = wt, y = mpg) |>
compute_group_c_hull() # 11 rows
y | cyl | disp | hp | drat | x | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
⎠| ⎠| ⎠| ⎠| ⎠| ⎠| ⎠| ⎠| ⎠| ⎠| ⎠| |
Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
# Step 2
StatChull <- ggproto(`_class` = "StatChull",
`_inherit` = ggplot2::Stat,
compute_group = compute_group_c_hull,
required_aes = c("x", "y"))
geom_chull <- function(mapping = NULL,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatChull,
geom = ggplot2::GeomPolygon,
data = data, mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
ggplot(data = mtcars) +
aes(x = wt, y = mpg) +
geom_point() +
geom_chull(alpha = .3)
last_plot() +
aes(color = factor(am),
fill = factor(am))
One-row geom for each row in input dataset; geom interdependence
A waterfall plot displays inflows and outflows that occur as a result of events as well as the balance across multiple events. It is typically displayed as a series of rectangles. Because the net change is displayed (cumulative change), there is interdependence between the geometries on our plot â where one rectangle ends, the next in the series begins. Therefore weâll be computing by panel and not by group â we do not want ggplot2 to split the data by discrete variables, which our x axis is most likely to be.
For âstep 0â, we base ggplot2 to accomplish this task, and actually pretty closely follow Hadley Wickhamâs short paper that tackles a waterfall plot with ggplot2. https://vita.had.co.nz/papers/ggplot2-wires.pdf
library(tidyverse)
flow_df <- data.frame(event = c("Sales",
"Refunds",
"Payouts",
"Court Losses",
"Court Wins",
"Contracts",
"Fees"),
change = c(6400, -1100,
-100, -4200, 3800,
1400, -2800)) %>%
mutate(event = factor(event))
balance_df <- flow_df %>% # maybe add factor in order if factor is not defined...
mutate(x_pos = event %>% as.numeric()) %>%
arrange(x_pos) %>%
mutate(balance = cumsum(c(0,
change[-nrow(.)]))) %>%
mutate(flow = factor(sign(change)))
ggplot(balance_df) +
geom_rect(
aes(x = event,
xmin = x_pos - 0.45,
xmax = x_pos + 0.45,
ymin = balance,
ymax = balance + change)) +
aes(fill = sign(change))
#> Ignoring unknown aesthetics: x
Then, we bundle up this computation into a function (step 1), called compute_panel_waterfall. We want the computation done panel-wise because of the interdependence between the events, which run along the x axis. Group-wise computation (the defining compute_group element), would fail us, as the cross-event interdependence would not be preserved.
compute_panel_waterfall <- function(data, scales, width = .90){
data %>%
arrange(x) %>%
mutate(balance = cumsum(c(0,
change[-nrow(.)]))) %>%
mutate(direction = factor(sign(change))) %>%
mutate(xmin = as.numeric(x) - width/2,
xmax = as.numeric(x) + width/2,
ymin = balance,
ymax = balance + change) %>%
# mutate(x = x_pos) %>%
mutate(y = ymax) %>%
mutate(gain_loss = ifelse(direction == 1, "gain", "loss"))
}
Now weâll pass the computation to the compute_panelâŚ
StatWaterfall <- ggplot2::ggproto(`_class` = "StatWaterfall",
`_inherit` = ggplot2::Stat,
required_aes = c("change", "x"),
compute_panel = compute_panel_waterfall)
In step 3, we define stat_waterfall, passing along StatWaterfall to create a ggplot2 layer function. We include a standard set of arguments, and we set the geom to ggplot2::GeomRect.
geom_waterfall <- function(
mapping = NULL,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatWaterfall, # proto object from step 2
geom = ggplot2::GeomRect,
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
In Step 4, we get to try out the functionality.
flow_df |>
ggplot() +
geom_hline(yintercept = 0) +
aes(change = change,
x = event) + # event in order
geom_waterfall()
aes
using delayed aesthetic evaluation (D.A.E.)The bonus topic is on defining a default aesthetic. In general, the direction of the flow is of great import for a waterfall chart, and it is typically depicted with fill color. However, flow direction mapped to fill color might not be absolutely fundamental in how we conceive of a waterfall plot. Therefore, instead of creating a variable âfillâ in the compute_panel_waterfall routine, we created gain_loss which we can reference with delayed aesthetic evaluation, in this case ggplot2::after_stat(). Weâll refer to it in the default_aes
slot of the StatWaterfall, using the ggplot2::after_stat() call on the internally created variable.
StatWaterfall <- ggplot2::ggproto(`_class` = "StatWaterfall",
`_inherit` = ggplot2::Stat,
required_aes = c("change", "x"),
compute_panel = compute_panel_waterfall,
default_aes = ### NEW!
ggplot2::aes(fill =
ggplot2::after_stat(gain_loss)))
Now, since geom_waterfall calls StatWaterfall and we have added the default fill aesthetic, when we re-execute the code creating our plot, now we automatically get the direction of flow mapped to fill color.
flow_df |>
ggplot() +
geom_hline(yintercept = 0) +
aes(change = change,
x = event) + # event in order
geom_waterfall()
However, we are not locked in to the gain_loss being the variable that defines fill, as seen below:
last_plot() +
aes(fill = event == "Sales")
And we can also turn off mapping to fill color by setting aes(fill = NULL)
.
last_plot() +
aes(fill = NULL)
a many-row geom for each row of the input data frame, with interdependence between input observations.
df_to_plot <- gapminder::gapminder %>%
filter(continent == "Americas") %>%
filter(year == 2002) %>%
select(country, pop)
packed_centers <- packcircles::circleProgressiveLayout(
df_to_plot$pop, sizetype = 'area')
circle_outlines <- packed_centers %>%
packcircles::circleLayoutVertices(npoints = 50)
circle_outlines %>%
ggplot() +
aes(x = x, y = y) +
geom_polygon() +
aes(group = id) +
coord_equal()
# Step 1
compute_panel_circlepack <- function(data, scales){
data_w_id <- data |>
mutate(id = row_number())
if(is.null(data$area)){
data_w_id <- data_w_id |>
mutate(area = 1)
}
data_w_id |>
pull(area) |>
packcircles::circleProgressiveLayout(
sizetype = 'area') |>
packcircles::circleLayoutVertices(npoints = 50) |>
left_join(data_w_id) |>
mutate(group = id)
}
StatCirclepack <- ggplot2::ggproto(`_class` = "StatCirclepack",
`_inherit` = ggplot2::Stat,
required_aes = c("id"),
compute_panel = compute_panel_circlepack#,
# default_aes = ggplot2::aes(group = after_stat(id))
)
geom_circlepack <- function(mapping = NULL, data = NULL,
position = "identity", na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatCirclepack, # proto object from Step 2
geom = ggplot2::GeomPolygon, # inherit other behavior
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
gapminder::gapminder |>
filter(year == 2002) |>
ggplot() +
aes(id = country, area = pop/1000000) +
geom_circlepack() +
coord_equal()
last_plot() +
aes(fill = continent)
last_plot() +
aes(fill = pop/1000000) +
facet_wrap(facets = vars(continent))
This next example is the case that TLP took on in his talk, but takes a bit different approach to be more consistent with other approaches in this cookbook. Essentially, for each row in our data set with defined centers x0 and y0 and radius r, we are joining up 15 rows which then help us build a circle around the x0y0 circle center.
a single row in a dataframe: will be visualized by a single mark : the mark will be defined by many-row in an internal dataframe
for each row in the dataframe, a single geometry is visualized, but each geometric mark is defined by many rowsâŚ
â../mytidytuesday/2023-12-27-geom_circle_via_join/geom_circle_via_join.Rmdâ
library(tidyverse)
n_vertices <- 15
data.frame(x0 = 0:1, y0 = 0:1, r = 1:2/3) |>
mutate(input_data_row_id = row_number()) |>
crossing(tibble(vertex_id = 0:n_vertices)) |>
mutate(angle = 2*pi*vertex_id/max(vertex_id)) |>
mutate(x = x0 + cos(angle)*r,
y = y0 + sin(angle)*r) |>
ggplot() +
aes(x, y) +
geom_path(aes(group = input_data_row_id)) +
geom_text(aes( label = vertex_id))
compute_panel_circle <- function(data, scales, n_vertices = 15){
data |>
mutate(group = row_number()) |>
crossing(tibble(vertex_id = 0:n_vertices)) |>
mutate(angle_in_circle = 2*pi*vertex_id/max(vertex_id)) |>
mutate(x = x0 + cos(angle_in_circle)*r,
y = y0 + sin(angle_in_circle)*r)
}
tibble(x0 = 1:2, y0 = 1:2, r = 1 ) |>
compute_panel_circle()
x0 | y0 | r | group | vertex_id | angle_in_circle | x | y |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 0 | 0.000 | 2.000 | 1.000 |
1 | 1 | 1 | 1 | 1 | 0.419 | 1.914 | 1.407 |
1 | 1 | 1 | 1 | 2 | 0.838 | 1.669 | 1.743 |
1 | 1 | 1 | 1 | 3 | 1.257 | 1.309 | 1.951 |
1 | 1 | 1 | 1 | 4 | 1.676 | 0.895 | 1.995 |
⎠| ⎠| ⎠| ⎠| ⎠| ⎠| ⎠| ⎠|
2 | 2 | 1 | 2 | 11 | 4.608 | 1.895 | 1.005 |
2 | 2 | 1 | 2 | 12 | 5.027 | 2.309 | 1.049 |
2 | 2 | 1 | 2 | 13 | 5.445 | 2.669 | 1.257 |
2 | 2 | 1 | 2 | 14 | 5.864 | 2.914 | 1.593 |
2 | 2 | 1 | 2 | 15 | 6.283 | 3.000 | 2.000 |
StatCircle <- ggproto(
`_class` = "StatCircle",
`_inherit` = ggplot2::Stat,
compute_panel = compute_panel_circle,
required_aes = c("x0", "y0", "r")
)
geom_circle <- function(
mapping = NULL,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatCircle, # proto object from Step 2
geom = ggplot2::GeomPolygon, # inherit other behavior
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
data.frame(x0 = 0:1,
y0 = 0:1,
r = 1:2/3) |>
ggplot() +
aes(x0 = x0, y0 = y0, r = r) +
geom_circle(color = "red",
linetype = "dashed") +
aes(fill = r)
diamonds |>
slice_sample(n = 80) |>
ggplot() +
aes(x0 = as.numeric(cut),
y0 = carat,
r = as.numeric(clarity)/20) +
geom_circle(alpha = .2, n_vertices = 5) +
aes(fill = after_stat(r)) +
coord_equal()
cars |>
sample_n(12) |>
ggplot() +
aes(x0 = speed, y0 = dist, r = dist/speed) +
geom_circle(color = "black") +
coord_equal()
last_plot() +
aes(alpha = speed > 15) +
aes(linetype = dist > 20) +
aes(fill = speed > 18) +
facet_wrap(~ dist > 40)
#> Using alpha for a discrete variable is not advised.
StatCircle2 <- ggproto(
`_class` = "StatCircle2",
`_inherit` = ggplot2::Stat,
compute_group = compute_panel_circle,
required_aes = c("x0", "y0", "r"))
geom_circle_CG <- function(
mapping = NULL,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatCircle2, # proto object from Step 2
geom = ggplot2::GeomPolygon, # inherit other behavior
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
cars |>
sample_n(12) |>
ggplot() +
aes(x0 = speed, y0 = dist, r = dist/speed) +
geom_circle_CG(color = "black") +
coord_equal() +
aes(alpha = speed > 15) +
aes(linetype = dist > 20) +
aes(fill = speed > 18) +
facet_wrap(~ dist > 40)
#> Using alpha for a discrete variable is not advised.
data.frame(x0 = 0:1, y0 = 0:1, r = 1:2/3) %>%
mutate(group = row_number()) %>%
tidyr::crossing(vertex_index = 0:15/15) %>%
dplyr::mutate(
y = y0 + r * (
.85 * cos(2*pi*vertex_index)
- .35 * cos(2 * 2*pi*vertex_index)
- .25 * cos(3 * 2*pi*vertex_index)
- .05 * cos(4 * 2*pi*vertex_index)
),
x = x0 + r * (sin(2*pi*vertex_index)^3)) %>%
ggplot() +
aes(x = x, y = y, group = group) +
geom_polygon(alpha = .5, fill = "darkred") +
coord_equal()
states_characteristics <- tibble(state.name) |>
mutate(ind_vowel_states =
str_detect(state.name, "A|E|I|O|U"))
head(states_characteristics)
state.name | ind_vowel_states |
---|---|
Alabama | TRUE |
Alaska | TRUE |
Arizona | TRUE |
Arkansas | TRUE |
California | FALSE |
Colorado | FALSE |
us_states_geo <- ggplot2::map_data("state")
head(us_states_geo)
long | lat | group | order | region | subregion |
---|---|---|---|---|---|
-87.462 | 30.390 | 1 | 1 | alabama | |
-87.485 | 30.372 | 1 | 2 | alabama | |
-87.525 | 30.372 | 1 | 3 | alabama | |
-87.531 | 30.332 | 1 | 4 | alabama | |
-87.571 | 30.327 | 1 | 5 | alabama | |
-87.588 | 30.327 | 1 | 6 | alabama |
states_characteristics |>
left_join(us_states_geo |> mutate(state.name = stringr::str_to_title(region))) |>
ggplot() +
aes(x = long, y = lat, group = group) +
geom_polygon() +
aes(fill = ind_vowel_states) +
coord_map()
ggplot2::map_data("state") |>
rename(state_name = region) |>
mutate(state_name = stringr::str_to_title(state_name)) |>
rename(x = long, y = lat) |>
select(-subregion) |>
rename(state_id_number = group) ->
continental_states_geo_reference
compute_panel_state <- function(data, scales){
data |>
dplyr::left_join(continental_states_geo_reference) |>
dplyr::mutate(group = state_id_number)
}
And letâs test out this computeâŚ
states_characteristics |>
rename(state_name = state.name) |>
compute_panel_state()
state_name | ind_vowel_states | x | y | state_id_number | order | group |
---|---|---|---|---|---|---|
Alabama | TRUE | -87.462 | 30.390 | 1 | 1 | 1 |
Alabama | TRUE | -87.485 | 30.372 | 1 | 2 | 1 |
Alabama | TRUE | -87.525 | 30.372 | 1 | 3 | 1 |
Alabama | TRUE | -87.531 | 30.332 | 1 | 4 | 1 |
Alabama | TRUE | -87.571 | 30.327 | 1 | 5 | 1 |
⎠| ⎠| ⎠| ⎠| ⎠| ⎠| ⎠|
Wyoming | FALSE | -106.857 | 41.012 | 63 | 15595 | 63 |
Wyoming | FALSE | -107.309 | 41.018 | 63 | 15596 | 63 |
Wyoming | FALSE | -107.922 | 41.018 | 63 | 15597 | 63 |
Wyoming | FALSE | -109.057 | 40.989 | 63 | 15598 | 63 |
Wyoming | FALSE | -109.051 | 40.995 | 63 | 15599 | 63 |
StatUsstate <- ggplot2::ggproto(`_class` = "StatUsstate",
`_inherit` = ggplot2::Stat,
required_aes = c("state_name"),
compute_panel = compute_panel_state)
geom_state <- function(mapping = NULL, data = NULL,
position = "identity", na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatUsstate, # proto object from Step 2
geom = ggplot2::GeomPolygon, # inherit other behavior
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
ggplot(data = states_characteristics) +
aes(state_name = state.name) +
geom_state() +
coord_map()
last_plot() +
aes(fill = ind_vowel_states)
last_plot() +
aes(fill = state.name == "Iowa")
If youâre a long time user of ggplot2, youâll probably have used geom_smooth. However, because geom_smooth estimates group-wise, that is, the modeling is done in the compute_group step of the Statâs specification, when a categorical variable is mapped (to color or group for example), multiple models are computed and visualized. When you want to visualize a single model that contain a categorical (discrete) variable, then, geom_smooth wonât be well suited to your problem.
To start to think about extension in this space, we create geom_ols_linear_parallel, the simple case of an OLS model with a continuous and categorical independent variables, with no interaction and no higher order terms. This simple extension could be widely useful in in teaching and industry. Yet, this is a very specific use case; in the bonus material weâll discuss in how the extension strategy could be made usable for more modeling cases.
penguins_df <- palmerpenguins::penguins |>
ggplot2::remove_missing()
#> Removed 11 rows containing missing values or values outside the scale range.
model_bill_length <- lm(bill_length_mm ~
bill_depth_mm + species,
data = penguins_df)
penguins_df |>
mutate(bill_length_fit =
model_bill_length$fitted.values) |>
ggplot() +
aes(x = bill_depth_mm,
y = bill_length_mm) +
geom_point() +
aes(color = species) +
geom_line(aes(y = bill_length_fit))
compute_panel_lm_parallel <- function(data, scales){
model <- lm(y ~ x + category, data = data)
data |>
mutate(y = model$fitted)
}
StatParallel <- ggplot2::ggproto(`_class` = "StatParallel",
`_inherit` = ggplot2::Stat,
required_aes = c("x", "y", "category"),
compute_panel = compute_panel_lm_parallel,
default_aes = aes(color = after_stat(category)))
geom_ols_linear_parallel <- function(mapping = NULL, data = NULL,
position = "identity", na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatParallel, # proto object from Step 2
geom = ggplot2::GeomLine, # inherit other behavior
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
ggplot(palmerpenguins::penguins) +
aes(x = bill_depth_mm,
y = bill_length_mm,
color = species,
category = species) +
geom_point() +
geom_ols_linear_parallel()
#> Removed 2 rows containing non-finite outside the scale range (`stat_parallel()`).
#> Removed 2 rows containing missing values or values outside the scale range (`geom_point()`).
a geom defined by an sf geometry column
Similar to our U.S. states example, where the state name is the positional aestheticâŚ
# data to visualize
# nc_county_info |> head()
# find a dataset with geographic info
nc_geo_reference <- sf::st_read(system.file("shape/nc.shp", package = "sf")) |>
select(NAME, FIPS, FIPSNO, geometry)
#> Reading layer `nc' from data source
#> `/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/sf/shape/nc.shp'
#> using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
nc_geo_reference |>
rename(fips = FIPS) |>
full_join(ggnorthcarolina::northcarolina_county_flat) |>
ggplot() +
geom_sf() +
aes(fill = BIR79)
Our objective is similar to the geom_state() construction that uses a reference dataframe that contains the latitudes and longitudes of the state perimeters in the compute step; the reference data is joined up via the state_name required aesthetic. Then we inherit geom behavior from GeomPolygon.
The sf layer approach is similar. Instead of adding many rows of data for each locality with longitude and latitude coordinates, however, the geometry list-column will be added in the compute step.
If we inspect the layer data for the choropleth created with base ggplot2, we see the geometry column and fill which will be familiar to you if youâve done any work with geom_sf(). However, youâll also note the xmin, xmax, ymin, and ymax columns. These are also needed for our reference data.
So⌠This is my question to the ggplot2 and spatial folks. Below, I have a routine for adding these xmin ymin etc columns. Do you have a better one? Perhaps something that uses a ggplot2 sf internal function?
last_plot() |> layer_data() |> head()
fill | geometry | PANEL | group | xmin | xmax | ymin | ymax | linetype | alpha | stroke |
---|---|---|---|---|---|---|---|---|---|---|
#152F48 | MULTIPOLYGON (((-81.47276 3⌠| 1 | -1 | -84.324 | -75.457 | 33.882 | 36.59 | 1 | 0.5 | |
#132C44 | MULTIPOLYGON (((-81.23989 3⌠| 1 | -1 | -84.324 | -75.457 | 33.882 | 36.59 | 1 | 0.5 | |
#1A3854 | MULTIPOLYGON (((-80.45634 3⌠| 1 | -1 | -84.324 | -75.457 | 33.882 | 36.59 | 1 | 0.5 | |
#142D46 | MULTIPOLYGON (((-76.00897 3⌠| 1 | -1 | -84.324 | -75.457 | 33.882 | 36.59 | 1 | 0.5 | |
#16304A | MULTIPOLYGON (((-77.21767 3⌠| 1 | -1 | -84.324 | -75.457 | 33.882 | 36.59 | 1 | 0.5 | |
#16314B | MULTIPOLYGON (((-76.74506 3⌠| 1 | -1 | -84.324 | -75.457 | 33.882 | 36.59 | 1 | 0.5 |
### 1, create sf reference dataframe w xmin, ymin, xmax and ymax using return_st_bbox_df function
return_st_bbox_df <- function(sf_df){
bb <- sf::st_bbox(sf_df)
data.frame(xmin = bb[1], ymin = bb[2],
xmax = bb[3], ymax = bb[4])
}
northcarolina_county_reference <-
sf::st_read(system.file("shape/nc.shp", package="sf")) |>
dplyr::rename(county_name = NAME,
fips = FIPS) |>
dplyr::select(county_name, fips, geometry) |>
dplyr::mutate(bb =
purrr::map(geometry,
return_st_bbox_df)) |>
tidyr::unnest(bb) |>
data.frame()
#> Reading layer `nc' from data source
#> `/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/sf/shape/nc.shp'
#> using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
compute_panel_county <- function(data, scales){
data |>
dplyr::inner_join(northcarolina_county_reference)
}
StatNcfips <- ggplot2::ggproto(`_class` = "StatNcfips",
`_inherit` = ggplot2::Stat,
required_aes = "fips|county_name",
compute_panel = compute_panel_county)
geom_county <- function(
mapping = NULL,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE,
crs = "NAD27", # "NAD27", 5070, "WGS84", "NAD83", 4326 , 3857
...) {
c(ggplot2::layer_sf(
stat = StatNcfips, # proto object from step 2
geom = ggplot2::GeomSf, # inherit other behavior
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = rlang::list2(na.rm = na.rm, ...)
),
coord_sf(crs = crs,
default_crs = sf::st_crs(crs),
datum = crs,
default = TRUE)
)
}
ggnorthcarolina::northcarolina_county_flat |>
ggplot() +
aes(fips = fips) +
geom_county(crs = "NAD83")
p <- last_plot()
p$coordinates$crs
#> [1] "NAD83"
last_plot() +
aes(fill = SID74/BIR74)
nc_geo_reference |>
rename(fips = FIPS) |>
full_join(ggnorthcarolina::northcarolina_county_flat) |>
ggplot() +
# geom_sf() +
aes(fill = BIR79) +
geom_sf_text(aes(label = NAME),
check_overlap = T,
color = "gray80")
#> st_point_on_surface may not give correct results for longitude/latitude data
Prestep.
northcarolina_county_reference |>
dplyr::pull(geometry) |>
sf::st_zm() |>
sf::st_point_on_surface() ->
points_sf
#> st_point_on_surface may not give correct results for longitude/latitude data
#https://github.com/r-spatial/sf/issues/231
the_coords <- do.call(rbind, sf::st_geometry(points_sf)) %>%
tibble::as_tibble() %>% setNames(c("x","y"))
nc_geo_reference_w_center_coords <- cbind(northcarolina_county_reference, the_coords)
compute_panel_county_label <- function(data, scales){
data |>
dplyr::inner_join(nc_geo_reference_w_center_coords)
}
StatNcfipslabel <- ggplot2::ggproto(`_class` = "StatNcfipslabel",
`_inherit` = ggplot2::Stat,
required_aes = "fips|county_name",
compute_panel = compute_panel_county_label,
default_aes =
aes(label = after_stat(county_name)))
geom_county_label <- function(
mapping = NULL,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE,
crs = "NAD27", # "NAD27", 5070, "WGS84", "NAD83", 4326 , 3857
...) {
c(ggplot2::layer_sf(
stat = StatNcfipslabel, # proto object from step 2
geom = ggplot2::GeomText, # inherit other behavior
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = rlang::list2(na.rm = na.rm, ...)
),
coord_sf(crs = crs,
default_crs = sf::st_crs(crs),
datum = crs,
default = TRUE)
)
}
nc_county_info |>
ggplot() +
aes(fips = fips) +
geom_county_label(size = 3)
last_plot() +
aes(label = BIR79) +
geom_county(alpha = .2)
Use a common STAT to create geom_estado and geom_estado_label
geo_ref_data_raw <- rnaturalearth::ne_countries(
scale = "medium", returnclass = "sf") %>%
select(name, continent, geometry) %>%
rename(country_name = name)
Use sf data to create another set of useful layers for making choropleths
Rather than defining geom functions, you might instead write stat_* functions which can be used with a variety of geoms. Letâs contrast geom_chull and stat_chull below.
geom_chull <- function(mapping = NULL,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatChull,
geom = ggplot2::GeomPolygon,
data = data, mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
stat_chull <- function(mapping = NULL,
geom = ggplot2::GeomPolygon,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatChull,
geom = geom,
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
The construction is almost identical. However, in the stat version, the geom is flexible because it can be user defined, instead of being hard-coded in the function. Its use allows you to go in different visual directions, but might have a higher cognitive load.
p <- ggplot(data = mtcars) +
aes(x = wt, y = mpg) +
geom_point()
p +
stat_chull(alpha = .3)
p +
stat_chull(geom = "point",
color = "red",
size = 4)
p +
stat_chull(geom = "text",
label = "c-hull point",
hjust = 0)
# shows stat does not well-serve "path" geom
p +
stat_chull(geom = "path",
label = "c-hull point",
hjust = 0)
#> Ignoring unknown parameters: `label` and `hjust`
Now, we also return to the waterfall question. Letâs see how we can prepare the same stat to serve both with GeomRect and GeomText to write user-facing functions. In brief, weâll create a stat_* user-facing function which doesnât hard-code our geom, but has the default GeomRect. Weâll alias stat_waterfall to geom_* waterfall, and also create geom_waterfall_text for labeling the rectangle-based layer.
StatWaterfall <- ggplot2::ggproto(`_class` = "StatWaterfall",
`_inherit` = ggplot2::Stat,
required_aes = c("change", "x"),
compute_panel = compute_panel_waterfall,
default_aes = ggplot2::aes(label = ggplot2::after_stat(change),
fill = ggplot2::after_stat(gain_loss),
vjust = ggplot2::after_stat((direction == -1) %>%
as.numeric)))
stat_waterfall <- function(geom = ggplot2::GeomRect,
mapping = NULL,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatWaterfall, # proto object from step 2
geom = geom, # inherit other behavior
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
geom_waterfall <- stat_waterfall
geom_waterfall_label <- function(..., lineheight = .8){
stat_waterfall(geom = "text",
lineheight = lineheight, ...)}
flow_df |>
ggplot() +
geom_hline(yintercept = 0) +
aes(change = change,
x = event) + # event in order
geom_waterfall() +
geom_waterfall_label()
last_plot() +
aes(x = fct_reorder(event, change))
last_plot() +
aes(x = fct_reorder(event, abs(change)))
If youâve still got some stamina, letâs talk about another great usage of DEA in Stat definitions: for default label definitions. Below, we overwrite the StatWaterfall default_aes yet again, with the default fill aes defined, but also the label and vjust aes, which are relevant to labeling.
Then we define a separate user-facing function geom_waterfall_label, based on the same stat.
StatWaterfall$default_aes <- ggplot2::aes(fill = ggplot2::after_stat(gain_loss),
label = ggplot2::after_stat(change),
vjust = ggplot2::after_stat((direction == -1) %>%
as.numeric))
geom_waterfall_label <- function(
mapping = NULL,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatWaterfall, # proto object from step 2
geom = ggplot2::GeomText,
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
flow_df |>
ggplot() +
geom_hline(yintercept = 0) +
aes(change = change,
x = event) + # event in order
geom_waterfall() +
geom_waterfall_label()
The final plot shows that while there are some convenience defaults for label and fill, these can be over-ridden.
last_plot() +
aes(label = ifelse(change > 0, "gain", "loss")) +
aes(fill = NULL)
n:1:80 is geom_smooth default.
ggplot(data = mtcars) +
aes(x = wt, y = mpg) +
geom_point() +
geom_smooth() +
stat_smooth(xseq = mtcars$wt,
geom = "point",
color = "blue")
compute_group_smooth_fit <- function(data, scales, method = NULL, formula = NULL,
se = TRUE, n = 80, span = 0.75, fullrange = FALSE,
level = 0.95, method.args = list(),
na.rm = FALSE, flipped_aes = NA){
out <- ggplot2::StatSmooth$compute_group(data = data, scales = scales,
method = method, formula = formula,
se = FALSE, n= n, span = span, fullrange = fullrange,
xseq = data$x,
level = .95, method.args = method.args,
na.rm = na.rm, flipped_aes = flipped_aes)
out$x_obs <- data$x
out$y_obs <- data$y
out$xend <- out$x_obs
out$yend <- out$y_obs
out
}
theme_chalkboard <- function(board_color = "darkseagreen4",
chalk_color = "lightyellow", ...){
ggplot2::theme_gray(...) %+replace% ##<< we'll piggy back on an existing theme
ggplot2::theme(
rect = ggplot2::element_rect(fill = board_color,
color = board_color),
text = ggplot2::element_text(color = chalk_color,
face = "italic",
size = 18),
panel.background = ggplot2::element_rect(fill = board_color,
color = board_color),
axis.text = ggplot2::element_text(color = chalk_color),
axis.ticks = ggplot2::element_line(color = chalk_color),
panel.grid = ggplot2::element_blank(),
complete = TRUE ##<< important, see 20.1.2 Complete themes in ggplot2 book
)
}
theme_chalkboard_slate <- function(){
theme_chalkboard("lightskyblue4", "honeydew")
}
ggplot(data = cars) +
aes(x = speed, dist) +
geom_point() +
theme_chalkboard()
last_plot() +
theme_chalkboard_slate()
See ggchalkboard for geoms_chalk_on() and geoms_chalk_off().
One easily created new coord function is the coord_page(). Here we just wrap the coord_trans function and setting y to be reversed. Therefore, our coordinate system will be set up more like a note page where we count lines from top to bottom instead of a Cartesian coordinate system which counts from bottom to top.
coord_page <- function(...){
coord_trans(y = "reverse", ...)
}
Similar to the properties of coord_page(), our aim with creating coord_poster() is to have vertical positioning go from top to bottom, but also to have the aspect ratio to be 1 (horizontal 1 unit is equal to one vertical move). Itâs between coord_equal() X coord_page()
coord_equal
#> function (ratio = 1, xlim = NULL, ylim = NULL, expand = TRUE,
#> clip = "on")
#> {
#> check_coord_limits(xlim)
#> check_coord_limits(ylim)
#> ggproto(NULL, CoordFixed, limits = list(x = xlim, y = ylim),
#> ratio = ratio, expand = expand, clip = clip)
#> }
#> <bytecode: 0x7f8f7ad6d808>
#> <environment: namespace:ggplot2>
CoordCartesian$transform
#> <ggproto method>
#> <Wrapper function>
#> function (...)
#> transform(...)
#>
#> <Inner function (f)>
#> function (data, panel_params)
#> {
#> data <- transform_position(data, panel_params$x$rescale,
#> panel_params$y$rescale)
#> transform_position(data, squish_infinite, squish_infinite)
#> }
CoordCartesian$aspect
#> <ggproto method>
#> <Wrapper function>
#> function (...)
#> aspect(...)
#>
#> <Inner function (f)>
#> function (ranges)
#> NULL
CoordCartesian$range
#> <ggproto method>
#> <Wrapper function>
#> function (...)
#> range(...)
#>
#> <Inner function (f)>
#> function (panel_params)
#> {
#> list(x = panel_params$x$dimension(), y = panel_params$y$dimension())
#> }
CoordFixed$is_free
#> <ggproto method>
#> <Wrapper function>
#> function (...)
#> is_free(...)
#>
#> <Inner function (f)>
#> function ()
#> FALSE
readlines_wo_roxygen("../ggverbatim/R/ggverbatim.R")
#> [1] "ggverbatim <- function(data, cat_cols = 1, row_var_name = NULL, cols_var_name = \"x\", value_var_name = NULL){"
#> [2] ""
#> [3] " message(\"Variables that represented visually are ; e.g. aesthetic mappying are 'x', and \" |> paste(row_var_name))"
#> [4] ""
#> [5] " row_var_name <- names(data)[1]"
#> [6] " names(data)[1] <- \"row_var\""
#> [7] ""
#> [8] " col_headers <- names(data)"
#> [9] " col_headers <- col_headers[2:length(col_headers)]"
#> [10] ""
#> [11] " data %>%"
#> [12] " mutate(row_var = fct_inorder(row_var)) %>%"
#> [13] " pivot_longer(cols = -cat_cols) %>%"
#> [14] " mutate(name = factor(name, levels = col_headers)) %>%"
#> [15] " rename(x = name) ->"
#> [16] " pivoted"
#> [17] ""
#> [18] " pivoted %>%"
#> [19] " ggplot() +"
#> [20] " aes(x = x) +"
#> [21] " labs(x = cols_var_name) +"
#> [22] " aes(y = row_var) +"
#> [23] " labs(y = row_var_name) +"
#> [24] " aes(label = value) +"
#> [25] " aes(fill = value) +"
#> [26] " scale_x_discrete(position = \"top\") +"
#> [27] " scale_y_discrete(limits=rev)"
#> [28] ""
#> [29] "}"
# get into ggplot2 plot space from edge list data frame
ggedgelist <- function(edgelist, nodelist = NULL, ...)(
# message("'name' a variable created in the 'nodes' dataframe")
if(is.null(nodelist)){
edgelist %>%
tidygraph::as_tbl_graph() %>%
ggraph::ggraph(...)
}else{ # join on nodes attributes if they are available
names(nodelist)[1] <- "name"
edgelist %>%
tidygraph::as_tbl_graph() %>%
dplyr::full_join(nodelist) %>%
ggraph::ggraph(...)
}
)
# get a fill viz w edgelist dataframe only
ggedgelist_quick <- function(edgelist, nodelist = NULL, include_names = F, ...){
p <- ggedgelist(edgelist = edgelist,
nodelist = nodelist, ...) +
ggraph::geom_edge_link(color = "orange") +
ggraph::geom_node_point(size = 9,
color = "steelblue",
alpha = .8)
if(include_names){p + ggraph::geom_node_label(aes(label = name))}else{p}
}
geom_node_label_auto <- function(...){
ggraph::geom_node_label(aes(label = name), ...)
}
geom_node_text_auto <- function(...){
ggraph::geom_node_text(aes(label = name), ...)
}
https://github.com/EvaMaeRey/ggstamp
https://github.com/EvaMaeRey/more_theme_easing_ideas
This is a placeholder for the ggtedious workshop, yet to be completed.
https://github.com/EvaMaeRey/ggtedious
#library(ggtedius)
Source: ../ggplot2-extension-cookbook/README.Rmd