gg express follow-up

TL&DR
gg express? why an abbreviated syntax for new layers where computation is performed under the hood…
Campitelli: stat_rasa
ggtemp: define_layer_temp and concision troubles
On the fly Stat mutation?
- pipe note from Teun vd Brand
Closing remarks, Other Relevant Work, Caveats

TL&DR

Abbreviated syntax for layers that employ new computations (via Stat), is probably useful. Mostly, I’ve looked at Elio C.’s work in a 2018 blog post. Some pretty plots were made - step-by-step of detrending!

gg express? why an abbreviated syntax for new layers where computation is performed under the hood…

ggtemp/ggexpress project exists because syntax to define a new layer with new compute seems a little more verbose and not proportionate to what you are trying to accomplish. It would be nice to have some friendly within-script routine that doesn’t consume 15 lines of code. New layer functions, e.g. geom_xmean, can make plotting code more succinct and readable; but right now this might be at the price of new Stat definition code.

I’ve learned other people think would be nice to have short-cut syntax to layer creation, and have worked on this problem, including Elio Campitelli who writes about his similar motivation (June Choe referred me):

But still, creating a geom from scratch is an involved process that doesn’t lend itself to simple transformations. – Elio Campitelli

The Spanish is slightly different, and maybe a little more to the point and from the gut:

Pero aún así, hacer un stat específico cada vez que uno quiere hacer un plot en particular es demasiado complicado.*

I worked with Elio Campitelli’s proposed stat_rasa() below - it is very nice.

It seems very similar to the ggtemp goals, and several decisions are the same like default geom is point, default position is identity. I think ggtemp might be coming from something that looks more like the full, standard extension syntax, and then maybe you’d be more positioned to migrate to that. Maybe. stat_rasa’s approach feels safer, because you aren’t creating new proto object on the fly within a function. Switching out which function to call within a ggproto object is something that is done in a lot of other contexts.

Elio’s StatRasa serves the case where computation happens group-wise. With define_layer_temp(), I’d started with compute_panel definition (just because of the motivating computation) and expanded back to compute_group. A general solution (w/ compute_layer too) or split would be good; like there could be a stat_rasa_group(), stat_rasa_panel(), stat_rasa_layer() for example.

I also heard about https://github.com/mjskay/ggbuilder from June Choe and think that is about even stat definitions within the plot build? I have given this less attention. So I’m not sure.

I also include some code Teun Van Der Brand offered on piping, but to be honest, practical implications and connection to this problem are not yet clear to me. 😭

Progessing with ggtemp::define_layer_temp(), I tried to make some changes to make the internal code more compact… but I think (hope) my ggplot2::Stat knowledge is failing me, and hope there is a solution. I want to make all the relevant ggproto slots accesible, setting NULL as default, and then filling them in with Stat$compute_group and the like if the user doesn’t define the slot. The StatRasa implementation blog example has the function working for compute_group only, whereas I think there would be a lot of use cases for the function that works in compute_panel arena.

I worked on a project with ggtemp and it was fun! I think more fun than it would have been w/ traditional syntax. https://github.com/EvaMaeRey/ggcircleof5ths We feed in a data frame contain a variable ‘chord’, which contains the progression F, A, Bb and Bbm, and see the harmonic moves being made! Which are discussed in https://switchedonpop.com/episodes/olivia-rodrigo-guts-vampire-bad-idea-right. Here, we’re looking at the out-of-key use of A chord instead of Am (in the key of F), and chromaticism and creepyness is introduced. Its followed by the Bb to Bbm change - more chromaticism - and to sad minor key!

tibble::tribble(~ lyric, ~chord, 
"Hate to give ... ", "F",
"...now, How's ...", "A",
"-bout? Just   ...", "Bb",
"...you, cool guy,", "Bbm") |>
  ggplot() + 
    aes(x0 = 0, 
        y0 = 0, 
        chord = chord) + 
    geom_chord_cof(r = .45, key = "F") + 
    coord_equal() +
  facet_wrap(~fct_inorder(lyric %>% str_wrap(25)), nrow = 1) +
    labs(title = "Vampire (& Creep) chord progression")

knitr::include_graphics("https://github.com/EvaMaeRey/ggcircleof5ths/blob/main/README_files/figure-gfm/unnamed-chunk-6-1.png?raw=true")

Campitelli: stat_rasa

https://eliocamp.github.io/codigo-r/en/2018/05/how-to-make-a-generic-stat-in-ggplot2/

Backend StatRasa, stat_rasa()

library(tidyverse)

StatRasa <- ggplot2::ggproto("StatRasa", ggplot2::Stat,
  compute_group = function(data, scales, fun, fun.args) {
     # Change default arguments of the function to the 
     # values in fun.args
     args <- formals(fun)
     for (i in seq_along(fun.args)) {
        if (names(fun.args[i]) %in% names(fun.args)) {
           args[[names(fun.args[i])]] <- fun.args[[i]]
        } 
     }
     formals(fun) <- args
     
     # Apply function to data
     fun(data)
})

# stat function used in ggplot
stat_rasa <- function(mapping = NULL, data = NULL,
                      geom = "point", 
                      position = "identity",
                      fun = NULL,
                      ...,
                      show.legend = NA,
                      inherit.aes = TRUE) {
   # Check arguments 
   if (!is.function(fun)) stop("fun must be a function")
   
   # Pass dotted arguments to a list
   fun.args <- match.call(expand.dots = FALSE)$`...`
   
   ggplot2::layer(
      data = data,
      mapping = mapping,
      stat = StatRasa,
      geom = geom,
      position = position,
      show.legend = show.legend,
      inherit.aes = inherit.aes,
      check.aes = FALSE,
      check.param = FALSE,
      params = list(
         fun = fun, 
         fun.args = fun.args,
         na.rm = FALSE,
         ...
      )
   )
}

UI, stat_rasa(fun = detrend)

# using .75 span to match ggplot2 geom_smooth
Detrend <- function(data, method = "lm", span = 0.75) {
   if (method == "lm") {
      data$y <- resid(lm(y ~ x, data = data))
   } else {
      data$y <- resid(loess(y ~ x, span = span, data = data))
   }
   as.data.frame(data)
   }
   
   
library(ggplot2)
set.seed(42)
x <- seq(-1, 3, length.out = 30)
y <- x^2 + rnorm(30)*0.5
df <- data.frame(x = x, y = y)

ggplot(df, aes(x, y)) +
  geom_line(aes(color = "raw data")) +
  geom_smooth(aes(color = "loess smoothing"),
              alpha = .3) + 
  stat_smooth(geom = "point", 
              aes(color = "loess smoothing"),
              xseq = df$x) +
  stat_rasa(geom = "line", 
             fun = Detrend, 
             method = "smooth",
             aes(color = "detrended")) +
  geom_hline(yintercept = 0, 
             linetype = "dashed") + 
  scale_color_discrete(breaks = 
                         fct_inorder(c("raw data",
                                       "loess smoothing",
                                       "detrended"))) +
  labs(title = "detrending with loess smoothing")

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

UI, stat_rasa -> stat_detrend

stat_detrend <- function(...) {
   stat_rasa(fun = Detrend, geom = "line", ...)
}

ggplot(df, aes(x, y)) + 
  geom_line(aes(color = "raw data")) +
  geom_smooth(method = "lm", 
              aes(color = "linear model")) +
  stat_smooth(method = "lm", geom = "segment",
              xend = df$x, yend = df$y,
              xseq = df$x, alpha = .2
              ) +
  stat_detrend(method = "lm", 
                aes(color = "detrended")) +
  geom_hline(yintercept = 0, 
             linetype = "dashed") + 
  scale_color_discrete(breaks = 
                         fct_inorder(c("raw data",
                                       "linear model",
                                       "detrended"))) + 
  labs(title = "Linear detrending",
       color = NULL)

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

stat_rasa(fun = summarize_xy)

summarize_xy <- function(data, sum_fun = mean){
  
  data |>
    summarize(
      x = sum_fun(x),
      y = sum_fun(y)
    )
  
}


mtcars |>
  ggplot() + 
  aes(x = wt,
      y = mpg) + 
  geom_point() + 
  stat_rasa(fun = summarize_xy, 
            size = 8,
            aes(color = "means")) + 
  stat_rasa(fun = summarize_xy, 
            sum_fun = median, 
            size = 8, 
            aes(color = "medians")) + 
  labs(color = "centroid")

ggtemp: define_layer_temp and concision troubles

Backend ggtemp:::define_layer_temp()

ggtemp:::define_layer_temp

## function(
##   required_aes,
##   compute_panel = NULL, 
##   compute_group = NULL,
##   geom = ggplot2::GeomPoint, 
##   mapping = NULL,
##   data = NULL,
##   position = "identity",
##   na.rm = FALSE,
##   show.legend = NA,
##   inherit.aes = TRUE, 
##   ...) {
## 
##   if(!is.null(compute_panel)){
## StatTemp <- ggproto(
##   `_class` = "StatTemp",
##   `_inherit` = ggplot2::Stat,
##   compute_panel = compute_panel,
##   required_aes = required_aes)
##   }
##   
##   if(!is.null(compute_group)){
## StatTemp <- ggproto(
##   `_class` = "StatTemp",
##   `_inherit` = ggplot2::Stat,
##   compute_group = compute_group,
##   required_aes = required_aes)
##   }  
## 
##   ggplot2::layer(
##     stat = StatTemp,  # proto object from Step 2
##     geom = geom,  # inherit other behavior
##     data = data,
##     mapping = mapping,
##     position = position,
##     show.legend = show.legend,
##     inherit.aes = inherit.aes,
##     params = list(na.rm = na.rm, ...)
##   )
## }
## <bytecode: 0x7fa5bf4807e8>
## <environment: namespace:ggtemp>

Backend ggtemp:::create_layer_temp()

ggtemp:::create_layer_temp

## function(fun_name ="geom_circle", 
##                                     compute_panel = NULL,
##                                     compute_group = NULL,
##                                     required_aes = c("x0", "y0", "r"),
##                                     geom = "point"){
## 
##   assign(x = fun_name, 
##          value = function(...){
##            
##   
##   define_layer_temp(
##     required_aes = required_aes,
##     compute_panel = compute_panel,
##     compute_group = compute_group,
##     geom = geom,
##     ...)  },
##   pos = 1
##   )
##   
## }
## <bytecode: 0x7fa5bf607460>
## <environment: namespace:ggtemp>

UI: define_layer_temp()

compute_panel_circle <- function(data, scales, n_vertices = 15){
  
  data |> 
    mutate(group = row_number()) |> 
    crossing(tibble(z = 0:n_vertices)) |>
    mutate(around = 2*pi*z/max(z)) |> 
    mutate(x = x0 + cos(around)*r,
           y = y0 + sin(around)*r) 
  
}

geom_circle <- function(...){
  
  ggtemp:::define_layer_temp(
  compute_panel = compute_panel_circle,
  required_aes = c("x0", "y0"),
    ...)
  
}

library(ggplot2)
data.frame(x0 = 0:1, y0 = 0:1, r = 1:2/3) |>
  ggplot() +
  aes(x0 = x0, y0 = y0, r = r) +
  geom_circle(n_vertices = 5) +
  aes(fill = r)

UI: ggtemp:::create_layer_temp()

compute_group_xy_summary <- function(data, scales){
  
  data |> 
    summarize(x = mean(x),
              y = mean(y))
  
}

ggtemp:::create_layer_temp("geom_xy_summary",
    compute_group = compute_group_xy_summary,
    required_aes = c("x", "y"))
  

mtcars |>
  ggplot() +
  aes(x = wt, y = mpg) +
  geom_point() + 
  geom_xy_summary(size = 8)

last_plot() + 
  aes(color = factor(gear))

Backend rewrite attempt..

This is giving me more problems; I thought I would be able to start with a lot of null elements, and use generic slot fillers like ggplot2::Stat$compute_panel… But strategy doesn’t seem to be working.

define_layer_temp <- function(
  compute_group = NULL,
  compute_panel = NULL, 
  required_aes = NULL,
  geom = ggplot2::GeomPoint, 
  mapping = NULL,
  data = NULL,
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE, 
  ...) {

  StatEmpty <- ggproto(
  `_class` = "StatEmpty",
  `_inherit` = ggplot2::Stat)
  
  if(is.null(compute_group)){compute_group <- ggplot2::StatEmpty$compute_group}
  if(is.null(compute_panel)){compute_panel <- ggplot2::StatEmpty$compute_panel}
  if(is.null(required_aes)) {required_aes  <- ggplot2::StatEmpty$required_aes}
     
     
StatTemp <- ggproto(
  `_class` = "StatTemp",
  `_inherit` = ggplot2::Stat,
  compute_group = compute_group,
  compute_panel = compute_panel,
  required_aes  = required_aes
  )
  
  ggplot2::layer(
    stat = StatTemp,  # proto object from Step 2
    geom = geom,  # inherit other behavior
    data = data,
    mapping = mapping,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}


compute_group_xy_summary <- function(data, scales){
  
  data |> 
    summarize(x = mean(x),
              y = mean(y))
  
}

geom_xy_means <- function(...){define_layer_temp(compute_group = compute_group_xy_summary, ...)}

ggplot(mtcars) + 
  aes(x = wt, y = mpg) + 
  geom_point() + 
  geom_xy_means()

On the fly Stat mutation?

ggbuilder https://github.com/mjskay/ggbuilder and pipe note might have something to do with more dynamic, on-the-fly internal data manipulation and stat creation…

df |>
  ggplot(aes(x = condition, y = response, color = condition)) +
  geom_boxplot() +
  plot_data() |>
    dplyr::filter(condition %in% c("B", "C")) |>
    dplyr::group_by(condition) |>
    dplyr::slice_max(response) |>
    geom_("label", aes(label = response))

pipe note from Teun vd Brand

library(tidyverse)
piped <- . %>% summarise(x = mean(x)) %>%
  mutate(x = x + 1)

piped

## Functional sequence with the following components:
## 
##  1. summarise(., x = mean(x))
##  2. mutate(., x = x + 1)
## 
## Use 'functions' to extract the individual functions.

magrittr::functions(piped)

## [[1]]
## function (.) 
## summarise(., x = mean(x))
## 
## [[2]]
## function (.) 
## mutate(., x = x + 1)