TL;DR

ggplot2’s chart building uses a lot of defaults (e.g. coordinate and theme specifications). These defaults are friendlier to certain chart types (scatter plots) than others (bars, pies). If we bundle up new collections of defaults we can provide alternative start points for ggplot2 builds. Good idea? Also, what’d be the least bad interface:

# idea 1
ggplot() + 
  defaults_pie()
  
# idea 2
ggplot() + 
  plot_pie()

# idea 3
ggpie()

# idea 4
ggplot(defaults = "pie")

ggplot the function

Have you ever built up plots with new comers to ggplot2, bit-by-bit? And maybe you just executed the code ‘ggplot(data = mtcars)’?

library(ggplot2)
ggplot(data = mtcars)

When you do, it’s fun to ask, “So what do we get?” A typical response from folks I’ve been teaching has been: “Nothing!” And this is a good moment to geek out about ggplot2, countering, “No, no. Not nothing. This is a blank canvass - full of promise and possibility (and the mtcars data)!”

And, ‘blank canvas’ is probably not a bad way to understand ggplot() as a beginner. However, initiating a ggplot is far from a ‘blank canvas’.

We can see this if we inspect the object p, created below instead of rendering it…

p <- ggplot(data = mtcars)

But inspecting length(p) we see that there are 9 list elements. (To not overwhelm, we don’t print out str(p) here)

# str(p)
length(p)
## [1] 9

And with names(p) you’ll see elements of the plot that may look familiar.

names(p)
## [1] "data"        "layers"      "scales"      "mapping"     "theme"      
## [6] "coordinates" "facet"       "plot_env"    "labels"

You may recognize many of the elements are the orthogonal components laid out in the Wilkinson’s ‘The Grammar of Graphics’ – components which define a data visualization.

ggplot2 is user friendly becauseyou do not need to specify all 9 elements to get a plot back. You only need to specify three things:

  1. data
  2. aesthetic mapping, and
  3. a geom or stat layer

But, hey, what about all the other elements, the theme, scales, coordinate system etc? For these, there are defaults and algorithms that pick defaults (especially based on declared aesthetic mappings). For example, coordinates are cartesian, rather than polar when we embark on a ggplot2 build, just by virtue of initiating with ggplot().

What’s the implication of the ggplot2 defaults doing so much work? They are generally a boon analyst productivity. Analysts don’t have to fiddle around with plot details to start to explore their data. Having ggplot() allows analysts to do a ‘123 of data+aes+layer’ and be off to the races inspecting their data. It thereby generally affords ‘pit of success’ tidyverse goal. Defaults are enormously beneficial for productivity and analyst success.

However, a success pit looks different according to one objectives. The ggplot function, because of its defaults, really points in toward a particular pits of success – while your vision of success, at least sometimes, might be different.

To be more specific, a ggplot2 1-2-3 scatterplot (data - aes - geom) is practically publication ready; while other plot types have miss-match in defaults for final form publication ready (I’m thinking bar charts - details coming soon); still other chart types cannot be produced at all using the simple data-aes-geom method in base ggplot2 (e.g. pie charts) .

We may feel the absence of the effortlessness of ‘data-aes-geom’ workflow for plot types that don’t have these available natively.

Meandering toward ggplot() redefaulting

Below, I show an example of a labeled bar chart. The chart requires changes to the theme, scale, and coord specifications to get an appropriate output. From a technical standpoint, I’d argue that these alternate choices would have grammatically unproblematic as the ggplot start point. Next we’ll look at collecting these alternatives, and offering a quick redefaulting funtction default_barlabs that has the labeled barchart as the end goal in mind.

ggplot(mtcars) +
  aes(x = factor(cyl)) +
  geom_bar(position = "dodge") +
  ggbarlabs::geom_barlab_count() +
  theme_classic() +
  theme(axis.line.y = element_blank(),
        axis.text.y.right = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        panel.grid.major.y = element_line(color = alpha("gray35", .1)),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        axis.ticks = element_blank(),
        axis.line.x = element_line(colour = "gray35"),
        legend.position = "top",
        legend.justification = 0) +
  scale_y_continuous(expand = expansion(mult = c(0, .1)))

Okay. Let’s see this. It is common to collect up decisions that are made with the theme() argument into a function like theme_barlabs. It is less common to bundle thematic decisions further, with, say scales. So this ‘big bundle’ is the ‘extension space’ that I’m exploring.

theme_barlabs <- function(base_size = 25, ...){
  
  theme_classic(base_size = base_size, ...)  %+replace%
  theme(axis.line.y = element_blank(),
        axis.text.y.right = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        panel.grid.major.y = element_line(color = alpha("gray35", .1)),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        axis.ticks = element_blank(),
        axis.line.x = element_line(colour = "gray35"),
        legend.position = "top",
        legend.justification = 0, 
        complete = TRUE
        )
}

Here are some of the default scales that will be bundled.

scale_y_barlabs <- function(...){
    ggplot2::scale_y_continuous(expand = expansion(mult = c(0, .15)), ...)
}


scale_fill_barlabs <- function(...){
    ggplot2::scale_fill_viridis_d(...)
}

And the default function itself:

defaults_barlabs <- function(){
  
  list(theme_barlabs(),
       scale_y_barlabs(), 
       scale_fill_barlabs()
       )
}



ggbarlabs <- function(data = NULL, ...){
  ggplot(data = data , ... ) +
  defaults_barlabs()
}

Other’s doing this?

The place I’ve definitely seen this before is ggraph::ggraph(). So it is not unprecedented. I’m not aware of other examples (except in my own extension experiments.) Please let me know of other examples. ggraph not only changes defaults but also preprocesses data.

Questions to readers:

I’m interested in feedback especially on the following questions.

  1. Are bundle of new defaults valid?
  2. Do you think I’m fundamentally misunderstanding ggplot2 initiation?
  3. Is redefaulting strategy technically problematic?
  4. Are there pitfalls I’m not seeing?
  5. If valid, and pitfalls can be avoided, what should the interface look like? Return to top under TL;DR for some ideas.
  6. How does one go about settling on the right defaults?

Applying the defaults

Here’s a couple of examples of possible interfaces using the functions created above.

ggbarlabs(mtcars) +               # a function replacing ggplot()
  aes(x = factor(cyl)) +
  geom_bar(position = "dodge") +
  ggbarlabs::geom_barlab_count()

ggplot(mtcars) +                  
  defaults_barlabs() +            # a function containing defaults
  aes(x = factor(cyl)) +
  geom_bar(position = "dodge") +
  ggbarlabs::geom_barlab_count()