Intro Thoughts

ggplot2 users sometimes forego a color or fill legend, instead coloring the text of categories in a title or annotation as they make mention of them using ggtext.

But the process for doing this, I believer, can be somewhat tedious and error prone, because you have to type your own html color tags. The color tags also might make source composed title text less readable.

I admire these plots, but maybe have made them once because they require some stamina and focus to choreograph the colored text coordination.

Below we ‘crack open ggplot2 internals’ to try to streamline this process. Our objective is that text coloring automatically matches whatever ggplot2 is doing with color and fill scales.

Status Quo

Here’s what people are doing, I believe. As you can see, it’s easy to have a mismatch between text color and scale schema.

library(tidyverse)

iris  |>
 ggplot(aes(Sepal.Length, Sepal.Width,
             fill = Species)) +
  geom_point(show.legend = FALSE, size = 7, alpha = .8, shape = 21, color = "white") +
  scale_fill_viridis_d(end = .8) +
  # title color are manually added
  labs(title = "<span style = 'color: red;'>Virginica irises</span> have the largest average sepal width") +
  theme(plot.title = ggtext::element_markdown())

Experiment: let’s get ‘cracking’

Now let’s ‘crack into ggplot2 internals’ to see if we can get to a match between categories and colors programmatically.

First, we’ll look at the colors actually rendered in the layer data of the plot using layer_data. These are often (always?) stored as hex colors which will definitely work in html/markdown context, whereas you have to be a little careful with some R named colors not working at all in html.

First, we’ll use layer_data to grab fill and group.

plot <- last_plot()

fill_values_df <- layer_data(plot) %>%  .[,c("fill", "group")] |> distinct()

Then we’ll grab the name of the variable that’s mapped to fill. This seems a little weird, but seems to works. Open to different approaches that might be more robust!

fill_var_name <- plot$mapping$fill |> capture.output() %>% .[2] %>% str_extract("\\^.+") %>% str_remove("\\^")

Then we can grab the actual vector of data that’s being represented by fill color - the plot$data slot. We put this in a dataframe/tibble, and then used distinct to get a one-to-one category-group table.

fill_var <- plot$data %>% .[,fill_var_name] 
fill_var_df <- tibble(fill_var, group = as.numeric(fill_var)) %>% distinct()

Then we join our colors and categories by group, and have a color-category one-to-one table. And then we can prepare an html statement that will make the category colorful when rendered by ggtext functionality.

left_join(fill_values_df, fill_var_df, by = join_by(group)) %>% 
  mutate(html_statements = paste0("<span style = 'color: ", fill, 
                         "'>", fill_var, "</span>") )
##        fill group   fill_var                                    html_statements
## 1 #440154FF     1     setosa     <span style = 'color: #440154FF'>setosa</span>
## 2 #2A788EFF     2 versicolor <span style = 'color: #2A788EFF'>versicolor</span>
## 3 #7AD151FF     3  virginica  <span style = 'color: #7AD151FF'>virginica</span>

into Functions.

Let’s put this in a function. The function will let us go straight from a ggplot2 plot object to a dataframe that has the fill information.

grab_fill_info <- function(plot){
  
fill_values_df <- layer_data(plot) %>%  .[,c("fill", "group")] |> distinct()
fill_var_name <- plot$mapping$fill |> capture.output() %>% .[2] %>% str_extract("\\^.+") %>% str_remove("\\^")
fill_var <- plot$data %>% .[,fill_var_name] 
fill_var_df <- tibble(fill_var, group = as.numeric(fill_var)) %>% distinct()


left_join(fill_values_df, fill_var_df, by = join_by(group)) %>% 
  mutate(html_statements = paste0("<span style = 'color: ", fill, 
                         "'>", fill_var, "</span></strong>") )
  
}

We can test this out with the plot we saved before:

grab_fill_info(plot = plot)
##        fill group   fill_var
## 1 #440154FF     1     setosa
## 2 #2A788EFF     2 versicolor
## 3 #7AD151FF     3  virginica
##                                               html_statements
## 1     <span style = 'color: #440154FF'>setosa</span></strong>
## 2 <span style = 'color: #2A788EFF'>versicolor</span></strong>
## 3  <span style = 'color: #7AD151FF'>virginica</span></strong>

And then we’ll use the data frame output, to make replacements in a string, adding the html tags. We’ll just get it done with a for loop. One danger, that I’m leaving for later, is that you might have a categories like ‘anana’ (this means pineapple in Portuguese and maybe some other languages) and ‘banana’ (this means banana in Portuguese and maybe some other languages). In this case, you’ll have a bad result given the current implementation. (Anana has an acent on the final syllable in Portuguese you might actually be saved!) male/female is the same problem - but not as nice of a tangent.

fill_df <- grab_fill_info(plot)

for(i in 1:nrow(fill_df)){

  start <- "virginica have the largest average sapel width"
  
  start <- start |> str_replace(fill_df$fill_var[i] %>% as.character(), fill_df$html_statements[i])

start %>% print()
  
}
## [1] "virginica have the largest average sapel width"
## [1] "virginica have the largest average sapel width"
## [1] "<span style = 'color: #7AD151FF'>virginica</span></strong> have the largest average sapel width"

Let’s put the for loop in a function:

auto_color_html <- function(x, fill_df ){
  
 for(i in 1:nrow(fill_df)){

  x <- x |> str_replace(fill_df$fill_var[i] %>% as.character, fill_df$html_statements[i])
  
 }
  
  x
  
}

Test it out…

auto_color_html("The setosa iris is cool", grab_fill_info(plot))
## [1] "The <span style = 'color: #440154FF'>setosa</span></strong> iris is cool"

The big test of everything

Now let’s use our functions with a fresh plot, q.

iris |>
 ggplot(aes(Sepal.Length, Sepal.Width,
             fill = Species)) +
  geom_point(show.legend = FALSE, shape = 21, color = "white", size = 8) +
  scale_fill_viridis_d(end = .8, begin = .2) +
  labs(title = "The **setosa** iris has the smallest average sapel width<br>and the **virginica** irises have largest average sapel width<br>
       while **versicolor** is in-between") +
  theme(plot.title = ggtext::element_markdown())

q <- last_plot()

q_fill_df <- grab_fill_info(q)

colorful_title <- "The **setosa** iris has the *smallest* average sapel width<br>and the **virginica** irises have *largest* average sapel width<br> while **versicolor** is in-between" |> 
        auto_color_html(q_fill_df)

q + 
  labs(title = colorful_title)  #overwriting title

Further bundling

Looks good. What if we wrap everything, and just replace the title with an html color-tagged version.

use_fill_scale_in_title_words <- function(plot){
  
  out <- plot
  plot_fill_df <- grab_fill_info(plot)
  
  out$labels$title <- out$labels$title |> 
        auto_color_html(plot_fill_df)

  return(out)
  
}

Try it out starting fresh.

my_title <- "The **setosa** irises has the largest average sapel widths <br>and then comes **virginica** irises while<br>**versicolor** has the shortest sapel width"

iris |>
 ggplot() +
  aes(x = Sepal.Length, 
      y = Sepal.Width,
      fill = Species) +
  geom_point(shape = 21, 
             color = "white", size = 8, alpha = .9) +
  scale_fill_viridis_d(end = .8, begin = .2) +
  labs(title = my_title) +
  theme(plot.title = ggtext::element_markdown())

use_fill_scale_in_title_words(plot = last_plot()) + 
  guides(fill = "none")

Closing remarks, Other Relevant Work, Caveats

  • fill mapping should be globally declared
  • data should be globally declared
  • should make function have an index for layer from which to get the fill information
  • naive string replacement issue