class: center, middle, inverse, title-slide .title[ # ASA-COWY, Spring 2025 ] .subtitle[ ## Everyday ggplot2 extension: New approaches to light-weight ‘geom’ (layer) extension ] .author[ ### Evangeline ‘Gina’ Reynolds ] --- ## Overview: ### 1. Creating new layers (geom_* functions) in ggplot2 extension has been described as an 'involved process'. -- ### 2. While new educational material has lowered the conceptual barrier to entry, the amount of scaffolding code required for classical extension can make using extension regularly feel prohibitive. -- ### 3. I'll discuss strategies and tools to make experimentation with new layer functionality, especially for pedagogy, lighter-weight and more fun. --- <!-- Tell you about: --> <!-- [Everyday ggplot2 extension](https://evamaerey.github.io/everyday_ggplot2_extension/) initiatives --> <!-- -- --> <!-- 1. Learning resources --> <!-- a. [easy recipes] --> <!-- b. [cookbook](https://evamaerey.github.io/ggplot2-extension-cookbook/) --> <!-- -- --> <!-- 2. Community - [Extenders club](https://github.com/teunbrand/ggplot-extension-club) --> <!-- a. meetup --> <!-- b. club --> <!-- c. workshop? --> <!-- -- --> <!-- 3. Light-weight approach to ggplot2 extension... --> --- ## Intro Thoughts ## ggbump --- count: false .panel1-bump-auto[ ``` r *library(tidyverse) ``` ] .panel2-bump-auto[ ] --- count: false .panel1-bump-auto[ ``` r library(tidyverse) *library(ggbump) ``` ] .panel2-bump-auto[ ] --- count: false .panel1-bump-auto[ ``` r library(tidyverse) library(ggbump) *us_cohort_life_exp_rank_2020 ``` ] .panel2-bump-auto[ ``` ## # A tibble: 60 × 7 ## country continent year lifeExp pop gdpPercap life_exp_rank ## <fct> <fct> <int> <dbl> <int> <dbl> <dbl> ## 1 Canada Americas 1952 68.8 14785584 11367. 2 ## 2 Canada Americas 1957 70.0 17010154 12490. 2 ## 3 Canada Americas 1962 71.3 18985849 13462. 1 ## 4 Canada Americas 1967 72.1 20819767 16077. 1 ## 5 Canada Americas 1972 72.9 22284500 18971. 1 ## 6 Canada Americas 1977 74.2 23796400 22091. 1 ## 7 Canada Americas 1982 75.8 25201900 22899. 1 ## 8 Canada Americas 1987 76.9 26549700 26627. 1 ## 9 Canada Americas 1992 78.0 28523502 26343. 1 ## 10 Canada Americas 1997 78.6 30305843 28955. 2 ## # ℹ 50 more rows ``` ] --- count: false .panel1-bump-auto[ ``` r library(tidyverse) library(ggbump) us_cohort_life_exp_rank_2020 %>% * ggplot() ``` ] .panel2-bump-auto[ <!-- --> ] --- count: false .panel1-bump-auto[ ``` r library(tidyverse) library(ggbump) us_cohort_life_exp_rank_2020 %>% ggplot() + * aes(x = year, * y = life_exp_rank, * color = country) ``` ] .panel2-bump-auto[ <!-- --> ] --- count: false .panel1-bump-auto[ ``` r library(tidyverse) library(ggbump) us_cohort_life_exp_rank_2020 %>% ggplot() + aes(x = year, y = life_exp_rank, color = country) + * geom_point() ``` ] .panel2-bump-auto[ <!-- --> ] --- count: false .panel1-bump-auto[ ``` r library(tidyverse) library(ggbump) us_cohort_life_exp_rank_2020 %>% ggplot() + aes(x = year, y = life_exp_rank, color = country) + geom_point() + * geom_bump() ``` ] .panel2-bump-auto[ <!-- --> ] <style> .panel1-bump-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-bump-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-bump-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Graphical Poem!!! ``` r life_exp_rank_2020 %>% ggplot() + aes(x = year, y = life_exp_rank, color = country) + geom_point() + * geom_bump() ``` -- ## "The article concludes by discussing some perceptual issues, and thinking about how we can build on the grammar to learn how to create graphical 'poems.'" - Hadley Wickham --- # Without geom_bump(), drawing a bumpchart would be a non-starter in most settings (business, government, academic). -- # Layer extension delivers value. -- ## "... creating new Stats[-based layers] is one of the most useful ways to extend the capabilities of ggplot2." – ggplot2: Elegant Graphics for Data Analysis --- class: inverse, center, middle # What if you have a much longer graphical poem in mind? --- <!-- --> --- count: false ``` ## # A tibble: 161 × 3 ## Default Choice decision ## <chr> <chr> <fct> ## 1 opt-in donor donor (1) ## 2 opt-in donor donor (1) ## 3 opt-in donor donor (1) ## 4 opt-in donor donor (1) ## 5 opt-in donor donor (1) ## 6 opt-in donor donor (1) ## 7 opt-in donor donor (1) ## 8 opt-in donor donor (1) ## 9 opt-in donor donor (1) ## 10 opt-in donor donor (1) ## # ℹ 151 more rows ``` --- count: false <!-- --> --- count: false <!-- --> --- count: false <!-- --> --- count: false <!-- --> --- count: false <!-- --> --- count: false <!-- --> --- count: false <!-- --> --- count: false <!-- --> --- count: false <!-- --> --- count: false <!-- --> --- count: false <!-- --> <style> .panel1-wp_prop_poem_small-auto { color: black; width: 99%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-wp_prop_poem_small-auto { color: black; width: NA%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-wp_prop_poem_small-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ## *Long form statistical graphical poems.*... are being written on the board *all the time* by stat's instructors and students. ## And isn't what's draw the *actual* graphical poem?  --- ### ggplot2 *sometimes* allows us to record those poems in concise, logical, consistent ways... -- ## The code is an expression of the *actual*, *drawn* poem. -- ## What is a a song? (is it the performed work or written down?) --- # - 'I have two kids, it's so hard to be so deep in ggplot, it takes at least two days just to get all the information in your head again. So I'm struggling with maintaining [ggbump, ggaluvial, ggsankey]' -- # Delivering new layer extensions can be costly. --- # - 'I have 200 kids...' Univeristy Professor --- # - 'I have 1000 kids...' Academy Curriculum Director --- # - 'I have a thesis to finish' Graduate Student --- ## Recognizing demands on peoples time and attention. -- ## Talk central question: Can we lower the costs to delivering layer extension? -- ## Especially in academic setting... -- ## ... (These people generally have staggeringly good extension ideas, and academic setting has potential to have ripple effect since educators have a *lot* of trainees...) :-) --- # 'How to extend ggplot2 while drowning'...? -- ## managing 'care tasks' ...  --- ## 1. extension is be an analytic 'care task' - can help you in the long run -- ## 2. but in the short run, barriers to entry and maintanance present a burden --- # 'How to extend ggplot2 without drowning'...? ### new, low code extension. (experimental) << <!-- -- --> <!-- Don't go deep. Stay intentionally light-weight, and superficial. --> <!-- -- --> <!-- Don't go too far. Keep things local. --> <!-- -- --> <!-- Don't swim alone! --> <!-- --- --> <!-- --- --> <!-- # Specifically... --> <!-- ### Check out new educational resources - super accessible recipes and cookbook. --> <!-- -- --> <!-- ### 'early exits' (no packaging, in script extension) --> <!-- -- --> <!-- ### minimal packaging (no CRAN, minimal messaging testing) --> <!-- -- --> -- --- --- # Status quo layer function creation. --- ### `geom_support()` layer creation via a Stat w/ 'new-classic' method What are the logical limits for observed proportions? ``` ## # A tibble: 6 × 3 ## Default Choice decision ## <chr> <chr> <fct> ## 1 opt-in donor donor (1) ## 2 opt-in donor donor (1) ## 3 opt-in donor donor (1) ## 4 opt-in donor donor (1) ## 5 opt-in donor donor (1) ## 6 opt-in donor donor (1) ``` <!-- --> --- count: false ### Step 0. use base ggplot2, compute before plotting .panel1-step0-auto[ ``` r *donor ``` ] .panel2-step0-auto[ ``` ## # A tibble: 161 × 3 ## Default Choice decision ## <chr> <chr> <fct> ## 1 opt-in donor donor (1) ## 2 opt-in donor donor (1) ## 3 opt-in donor donor (1) ## 4 opt-in donor donor (1) ## 5 opt-in donor donor (1) ## 6 opt-in donor donor (1) ## 7 opt-in donor donor (1) ## 8 opt-in donor donor (1) ## 9 opt-in donor donor (1) ## 10 opt-in donor donor (1) ## # ℹ 151 more rows ``` ] --- count: false ### Step 0. use base ggplot2, compute before plotting .panel1-step0-auto[ ``` r donor |> * summarise(x = min(as.numeric(decision)), * xend = max(as.numeric(decision)), * y = 0, * yend = 0) ``` ] .panel2-step0-auto[ ``` ## # A tibble: 1 × 4 ## x xend y yend ## <dbl> <dbl> <dbl> <dbl> ## 1 1 2 0 0 ``` ] --- count: false ### Step 0. use base ggplot2, compute before plotting .panel1-step0-auto[ ``` r donor |> summarise(x = min(as.numeric(decision)), xend = max(as.numeric(decision)), y = 0, yend = 0) -> *donor_min_max ``` ] .panel2-step0-auto[ ] --- count: false ### Step 0. use base ggplot2, compute before plotting .panel1-step0-auto[ ``` r donor |> summarise(x = min(as.numeric(decision)), xend = max(as.numeric(decision)), y = 0, yend = 0) -> donor_min_max *donor ``` ] .panel2-step0-auto[ ``` ## # A tibble: 161 × 3 ## Default Choice decision ## <chr> <chr> <fct> ## 1 opt-in donor donor (1) ## 2 opt-in donor donor (1) ## 3 opt-in donor donor (1) ## 4 opt-in donor donor (1) ## 5 opt-in donor donor (1) ## 6 opt-in donor donor (1) ## 7 opt-in donor donor (1) ## 8 opt-in donor donor (1) ## 9 opt-in donor donor (1) ## 10 opt-in donor donor (1) ## # ℹ 151 more rows ``` ] --- count: false ### Step 0. use base ggplot2, compute before plotting .panel1-step0-auto[ ``` r donor |> summarise(x = min(as.numeric(decision)), xend = max(as.numeric(decision)), y = 0, yend = 0) -> donor_min_max donor |> *ggplot() ``` ] .panel2-step0-auto[ <!-- --> ] --- count: false ### Step 0. use base ggplot2, compute before plotting .panel1-step0-auto[ ``` r donor |> summarise(x = min(as.numeric(decision)), xend = max(as.numeric(decision)), y = 0, yend = 0) -> donor_min_max donor |> ggplot() + * aes(x = decision) ``` ] .panel2-step0-auto[ <!-- --> ] --- count: false ### Step 0. use base ggplot2, compute before plotting .panel1-step0-auto[ ``` r donor |> summarise(x = min(as.numeric(decision)), xend = max(as.numeric(decision)), y = 0, yend = 0) -> donor_min_max donor |> ggplot() + aes(x = decision) + * geom_bar(width = .2) ``` ] .panel2-step0-auto[ <!-- --> ] --- count: false ### Step 0. use base ggplot2, compute before plotting .panel1-step0-auto[ ``` r donor |> summarise(x = min(as.numeric(decision)), xend = max(as.numeric(decision)), y = 0, yend = 0) -> donor_min_max donor |> ggplot() + aes(x = decision) + geom_bar(width = .2) + * geom_segment(data = donor_min_max, * aes(x = x, * xend = xend, * y = 0, * yend = 0)) ``` ] .panel2-step0-auto[ <!-- --> ] <style> .panel1-step0-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-step0-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-step0-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> ``` r donor |> summarise(x = min(as.numeric(decision)), xend = max(as.numeric(decision)), y = 0, yend = 0) -> donor_min_max donor |> ggplot() + aes(x = decision) + geom_bar(width = .2) + geom_segment(data = donor_min_max, aes(x = x, xend = xend, y = 0, yend = 0)) ``` <!-- --> --- # Step 1. Define Compute ``` r compute_panel_support <- function(data, scales){ data |> summarize(xend = max(x), x = min(x), yend = 0, y = 0) } ``` --- # Step 2. Define Stat ``` r StatSupport <- ggproto(`_class` = "StatSupport", `_inherit` = ggplot2::Stat, compute_panel = compute_panel_support, required_aes = "x") ``` --- # Step 3. Define user-facing function ``` r geom_support <- function (mapping = NULL, data = NULL, geom = "segment", position = "identity", ..., show.legend = NA, inherit.aes = TRUE) { layer(data = data, mapping = mapping, stat = StatSupport, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = rlang::list2(na.rm = FALSE, ...)) } ``` --- # Yippee!! Such a nice poem!! ``` r donor |> ggplot() + aes(x = decision) + geom_bar(width = .2) + * geom_support() ``` <!-- --> -- ## But we are just getting started!! 😬😭 --- # At what cost? How much code? <!-- --> --- # At what cost? How much code? <!-- --> ``` ## fill label group PANEL count cum_n xmax xmin r r0 ## 1 #F8766D 18.9%\n(94) 1 1 94 94 0.1891348 0.0000000 12.57776 0 ## 2 #00BFC4 81.1%\n(403) 2 1 403 497 1.0000000 0.1891348 12.57776 0 ## ymin ymax y prop percent r_prop r_nudge x y_text ## 1 0 12.57776 0 0.1891348 18.9% 1 0 0.0945674 12.57776 ## 2 0 12.57776 0 0.8108652 81.1% 1 0 0.5945674 12.57776 ## angle_wedge colour linewidth linetype alpha ## 1 55.95573 NA 0.5 1 NA ## 2 -124.04427 NA 0.5 1 NA ``` --- # Scaffolding code adding up... ``` r 9 * # number of layers in ma206 lesson 1 poem (stat_chars + layer_chars) # scaffolding code per layer ``` ``` ## [1] 3627 ``` -- ## Making logic harder to follow -- ## and code less readable --- # New approach, w {statexpress} ``` r library(statexpress) ``` -- ### Use `statexpress::qstat` and `statexpress::qlayer` to build up poem. -- ### Combines step 2 and 3. Requires much less code. -- ### Delivers same user experience: computational work under the hood. --- # Step 1. Compute ``` r compute_panel_support <- function(data, scales){ data |> summarize(xend = max(x), x = min(x), yend = 0, y = 0) } ``` --- # Step 1.5 Sketch ``` r donor |> ggplot() + aes(x = decision) + geom_bar(width = .2) + # sketching geom_support qlayer(geom = GeomSegment, stat = qstat_panel(compute_panel_support)) ``` <!-- --> statexpress::qstat_panel - define panel-wise computation for a temp stat statexpress::qlayer - use a more filled-in version of layer --- # Step 2. User-facing function ``` r geom_support <- function(...){ qlayer(stat = qstat_panel(compute_panel_support), geom = GeomSegment, ...) } ``` --- ### Yippee - Graphical Poem!! ``` r donor |> ggplot() + aes(x = decision) + geom_bar(width = .2) + * geom_support() ``` <!-- --> --- <img src="asa-cowy-fall-2024_files/figure-html/unnamed-chunk-12-1.png" width="70%" /> --- ``` r knitrExtra::chunk_code_get("StatSupport") ``` ``` ## [1] "StatSupport <- ggproto(`_class` = \"StatSupport\"," ## [2] " `_inherit` = ggplot2::Stat," ## [3] " compute_panel = compute_panel_support," ## [4] " required_aes = \"x\")" ``` --- # Part 0. Sketch epic poem on chalkboard or paper, and the sketch the ggplot2 extension syntax that would support that. --- count: false .panel1-wp_prop_poem_small-auto[ ``` ## # A tibble: 161 × 3 ## Default Choice decision ## <chr> <chr> <fct> ## 1 opt-in donor donor (1) ## 2 opt-in donor donor (1) ## 3 opt-in donor donor (1) ## 4 opt-in donor donor (1) ## 5 opt-in donor donor (1) ## 6 opt-in donor donor (1) ## 7 opt-in donor donor (1) ## 8 opt-in donor donor (1) ## 9 opt-in donor donor (1) ## 10 opt-in donor donor (1) ## # ℹ 151 more rows ``` ] .panel2-wp_prop_poem_small-auto[ ``` r *donor ``` ] --- count: false .panel1-wp_prop_poem_small-auto[ <!-- --> ] .panel2-wp_prop_poem_small-auto[ ``` r donor |> * ggplot() + theme_chalkboard(base_size = 18) ``` ] --- count: false .panel1-wp_prop_poem_small-auto[ <!-- --> ] .panel2-wp_prop_poem_small-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + * aes(x = decision) ``` ] --- count: false .panel1-wp_prop_poem_small-auto[ <!-- --> ] .panel2-wp_prop_poem_small-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + * geom_stack() ``` ] --- count: false .panel1-wp_prop_poem_small-auto[ <!-- --> ] .panel2-wp_prop_poem_small-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + * geom_stack_label() ``` ] --- count: false .panel1-wp_prop_poem_small-auto[ <!-- --> ] .panel2-wp_prop_poem_small-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + * geom_support() ``` ] --- count: false .panel1-wp_prop_poem_small-auto[ <!-- --> ] .panel2-wp_prop_poem_small-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + geom_support() + * geom_prop() ``` ] --- count: false .panel1-wp_prop_poem_small-auto[ <!-- --> ] .panel2-wp_prop_poem_small-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + geom_support() + geom_prop() + * geom_prop_label() ``` ] --- count: false .panel1-wp_prop_poem_small-auto[ <!-- --> ] .panel2-wp_prop_poem_small-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + geom_support() + geom_prop() + geom_prop_label() + * stamp_prop() ``` ] --- count: false .panel1-wp_prop_poem_small-auto[ <!-- --> ] .panel2-wp_prop_poem_small-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + geom_support() + geom_prop() + geom_prop_label() + stamp_prop() + * stamp_prop_label() ``` ] --- count: false .panel1-wp_prop_poem_small-auto[ <!-- --> ] .panel2-wp_prop_poem_small-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + geom_support() + geom_prop() + geom_prop_label() + stamp_prop() + stamp_prop_label() + * geom_normal_prop_null() ``` ] --- count: false .panel1-wp_prop_poem_small-auto[ <!-- --> ] .panel2-wp_prop_poem_small-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + geom_support() + geom_prop() + geom_prop_label() + stamp_prop() + stamp_prop_label() + geom_normal_prop_null() + * geom_normal_prop_null_sds() ``` ] <style> .panel1-wp_prop_poem_small-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-wp_prop_poem_small-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-wp_prop_poem_small-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Part 1. Sketch w qlayer and qstat ``` r donor |> ggplot() + aes(x = decision) + # 1. geom_stack() qlayer(geom = GeomTile, stat = qstat(compute_group_bricks)) + # 2. geom_stack_label() qlayer(geom = GeomText, stat = qstat(compute_group_count)) + # 3. geom_support() qlayer(geom = GeomSegment, stat = qstat_panel(compute_scale)) + # 4. geom_prop() qlayer(geom = GeomText, stat = qstat_panel(compute_xmean_at_y0)) + # 5. geom_prop_label() qlayer(geom = GeomLabel, stat = qstat_panel(compute_xmean_at_y0_label)) + # 6. stamp_prop() qlayer(geom = GeomText, stat = qstat_panel(compute_panel_prop_asserted)) + # 7. stamp_prop_label() qlayer(geom = GeomLabel, stat = qstat_panel(compute_panel_prop_asserted_label)) + # 8. geom_norm() qlayer(geom = GeomArea, stat = qstat_panel(compute_dnorm_prop)) + # 9. geom_prop_norm() qlayer(geom = GeomSegment, stat = qstat_panel(compute_dnorm_prop_sds)) ``` --- ### Aesthetically, room for improvement... <!-- --> --- ### Part 2. Spruce up *Geoms* w/ statexpress' `qproto_update_defaults()`, e.g. justification, linetype, color, transparency ``` r donor |> ggplot() + aes(x = decision) + # 1 geom_stack qlayer(geom = qproto_update(GeomTile, aes(color = "white")), stat = qstat(compute_group_bricks)) + # 2 geom_stack_label() qlayer(geom = qproto_update(GeomText, aes(vjust = 0)), stat = qstat(compute_group_count)) + # 3 geom_xrange, show scale, range at y is zero qlayer(geom = GeomSegment, stat = qstat_panel(compute_scale)) + ### --- ### --- ### --- # 8. geom_norm on prop plot qlayer(geom = qproto_update(GeomArea, aes(alpha = .2)), stat = qstat_panel(compute_dnorm_prop)) + # 9. geom_prop_norm w/ sd marks qlayer(geom = qproto_update(GeomSegment, aes(linetype = "dotted")), stat = qstat_panel(compute_dnorm_prop_sds)) ``` --- <!-- --> --- # Part 3. Write user facing functions... ``` r geom_stack <- function(...){ qlayer(geom = qproto_update(GeomTile, aes(color = "white")), stat = qstat(compute_group_bricks), ...) } geom_stack_label <- function(...){ qlayer(geom = qproto_update(GeomText, aes(vjust = 0)), stat = qstat(compute_group_count), ...) } geom_support <- function(...){ qlayer(geom = GeomSegment, stat = qstat_panel(compute_scale), ...) } geom_prop <- function(...){ qlayer(geom = qproto_update(GeomText, aes(size = 6, vjust = 1)), stat = qstat_panel(compute_xmean_at_y0), ...) } # etc ... ``` --- count: false .panel1-wp_prop_poem_small2-auto[ ``` r *donor ``` ] .panel2-wp_prop_poem_small2-auto[ ``` ## # A tibble: 161 × 3 ## Default Choice decision ## <chr> <chr> <fct> ## 1 opt-in donor donor (1) ## 2 opt-in donor donor (1) ## 3 opt-in donor donor (1) ## 4 opt-in donor donor (1) ## 5 opt-in donor donor (1) ## 6 opt-in donor donor (1) ## 7 opt-in donor donor (1) ## 8 opt-in donor donor (1) ## 9 opt-in donor donor (1) ## 10 opt-in donor donor (1) ## # ℹ 151 more rows ``` ] --- count: false .panel1-wp_prop_poem_small2-auto[ ``` r donor |> * ggplot() + theme_chalkboard(base_size = 18) ``` ] .panel2-wp_prop_poem_small2-auto[ <!-- --> ] --- count: false .panel1-wp_prop_poem_small2-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + * aes(x = decision) ``` ] .panel2-wp_prop_poem_small2-auto[ <!-- --> ] --- count: false .panel1-wp_prop_poem_small2-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + * geom_stack() ``` ] .panel2-wp_prop_poem_small2-auto[ <!-- --> ] --- count: false .panel1-wp_prop_poem_small2-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + * geom_stack_label() ``` ] .panel2-wp_prop_poem_small2-auto[ <!-- --> ] --- count: false .panel1-wp_prop_poem_small2-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + * geom_support() ``` ] .panel2-wp_prop_poem_small2-auto[ <!-- --> ] --- count: false .panel1-wp_prop_poem_small2-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + geom_support() + * geom_prop() ``` ] .panel2-wp_prop_poem_small2-auto[ <!-- --> ] --- count: false .panel1-wp_prop_poem_small2-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + geom_support() + geom_prop() + * geom_prop_label() ``` ] .panel2-wp_prop_poem_small2-auto[ <!-- --> ] --- count: false .panel1-wp_prop_poem_small2-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + geom_support() + geom_prop() + geom_prop_label() + * stamp_prop() ``` ] .panel2-wp_prop_poem_small2-auto[ <!-- --> ] --- count: false .panel1-wp_prop_poem_small2-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + geom_support() + geom_prop() + geom_prop_label() + stamp_prop() + * stamp_prop_label() ``` ] .panel2-wp_prop_poem_small2-auto[ <!-- --> ] --- count: false .panel1-wp_prop_poem_small2-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + geom_support() + geom_prop() + geom_prop_label() + stamp_prop() + stamp_prop_label() + * geom_normal_prop_null() ``` ] .panel2-wp_prop_poem_small2-auto[ <!-- --> ] --- count: false .panel1-wp_prop_poem_small2-auto[ ``` r donor |> ggplot() + theme_chalkboard(base_size = 18) + aes(x = decision) + geom_stack() + geom_stack_label() + geom_support() + geom_prop() + geom_prop_label() + stamp_prop() + stamp_prop_label() + geom_normal_prop_null() + * geom_normal_prop_null_sds() ``` ] .panel2-wp_prop_poem_small2-auto[ <!-- --> ] <style> .panel1-wp_prop_poem_small2-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-wp_prop_poem_small2-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-wp_prop_poem_small2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Benefit to {statexpress} method -- ## lets you see how these layers are fitting together in a story *as* you craft them. --- # Part 4. Circulate to Collegues; make adjustments ## 'Wow! That is great. What I like about it most is that it provides the visualization or the comparison of the statistics to the null distribution right there. I find that is the most challenging concept for *cadets* and it is cumbersome to either go to the applet or to draw it on the board to visualize this relationship. It disrupts the flow and their train of thought. Doing it all in one is very slick! Very easy code too!' Kyle -- ## 'I'm not a statistician, but showing a normal distribution for a binary variable for e.g. the prop test seems sketchy to me. Shouldn't it be some weird beta distribution that is bounded in [0, 1]?' Teun --- https://ggplot2-extenders.github.io/ggplot-extension-club/ --- # Appendix. Details about {statexpress} https://github.com/EvaMaeRey/statexpress --- # What are these statexpress functions? ## qstat() - locally defining Stat behavior based on compute, don't need to create and ggproto object -- ## qlayer() - pre-populated, generic layer function, already more user-facing -- qproto_update() - allows aes updating rather than dig, copy and paste --- ``` r statexpress::qstat_panel ``` ``` ## function (compute_panel, ...) ## { ## ggplot2::ggproto("StatTemp", Stat, compute_panel = compute_panel, ## ...) ## } ## <bytecode: 0x7fe19c853698> ## <environment: namespace:statexpress> ``` --- ``` r statexpress::qlayer ``` ``` ## function (mapping = NULL, data = NULL, geom = GeomPoint, stat = StatIdentity, ## position = position_identity(), ..., na.rm = FALSE, show.legend = NA, ## inherit.aes = TRUE) ## { ## ggplot2::layer(data = data, mapping = mapping, geom = geom, ## stat = stat, position = position, show.legend = show.legend, ## inherit.aes = inherit.aes, params = rlang::list2(na.rm = na.rm, ## ...)) ## } ## <bytecode: 0x7fe19afc85b0> ## <environment: namespace:statexpress> ``` --- ``` r statexpress::qproto_update ``` ``` ## function (`_inherit`, default_aes_update = NULL, ...) ## { ## proto_update("protoTemp", `_inherit`, default_aes_update = default_aes_update, ## ...) ## } ## <bytecode: 0x7fe19afcb1c8> ## <environment: namespace:statexpress> ``` ``` r statexpress::proto_update ``` ``` ## function (`_class`, `_inherit`, default_aes_update = NULL, ...) ## { ## if (!is.null(default_aes_update)) { ## default_aes <- aes(!!!modifyList(`_inherit`$default_aes, ## default_aes_update)) ## } ## ggplot2::ggproto(`_class` = `_class`, `_inherit` = `_inherit`, ## default_aes = default_aes, ...) ## } ## <bytecode: 0x7fe19afce540> ## <environment: namespace:statexpress> ``` --- # What about the compute? -- ### 'There is a bunch of scaffolding code and I won't bore you with that'. Thomas Lin Pederson 2020 -- ### 'There is a bunch of compute and I won't bore you with that'. This talk --- You are writing the compute anyway... -- But, yes, it might be a little trickier in ggplot2 context... https://github.com/teunbrand/ggplot-extension-club/discussions/57#discussioncomment-11170862 --- ``` r knitrExtra::chunk_code_get("compute_for_prop_story") ``` ``` ## [1] "library(statexpress)" ## [2] "library(tidyverse)" ## [3] "" ## [4] "# 1. layer stack of bricks" ## [5] "compute_group_bricks <- function(data, scales, width = .2){" ## [6] " " ## [7] " data %>% " ## [8] " mutate(row = row_number()) %>% " ## [9] " mutate(y = row - .5) %>% " ## [10] " mutate(width = width)" ## [11] " " ## [12] "}" ## [13] "" ## [14] "# 2. layer label stack with count" ## [15] "compute_group_count <- function(data, scales){" ## [16] " " ## [17] " data %>% " ## [18] " count(x) %>% " ## [19] " mutate(y = n," ## [20] " label = n)" ## [21] " " ## [22] "}" ## [23] "" ## [24] "" ## [25] "# 3. layer add x span" ## [26] "compute_scale <- function(data, scales){" ## [27] " " ## [28] " data %>% " ## [29] " summarise(min_x = min(x)," ## [30] " xend = max(x)," ## [31] " y = 0," ## [32] " yend = 0) %>% " ## [33] " rename(x = min_x)" ## [34] " " ## [35] "}" ## [36] "" ## [37] "" ## [38] "# 4. layer add balancing point " ## [39] "compute_xmean_at_y0 <- function(data, scales){" ## [40] " " ## [41] " data %>% " ## [42] " summarise(x = mean(x)," ## [43] " y = 0, " ## [44] " label = \"^\") " ## [45] " " ## [46] "}" ## [47] "" ## [48] "# 5. layer add balancing point value label" ## [49] "compute_xmean_at_y0_label <- function(data, scales){" ## [50] " " ## [51] " data %>% " ## [52] " summarise(x = mean(x)," ## [53] " y = 0, " ## [54] " label = after_stat(round(x - 1, 2))) " ## [55] " " ## [56] "}" ## [57] "" ## [58] "" ## [59] "" ## [60] "# 6. " ## [61] "compute_panel_prop_asserted <- function(data, scales, null = .5){" ## [62] " " ## [63] " # stamp type layer - so ignor input data" ## [64] " data.frame(y = 0, " ## [65] " x = null + 1," ## [66] " label = \"^\"" ## [67] " )" ## [68] " " ## [69] "}" ## [70] "" ## [71] "compute_panel_prop_asserted_label <- function(data, scales, null = .5){" ## [72] " " ## [73] " # stamp type layer - so ignor input data" ## [74] " data.frame(y = 0, " ## [75] " x = null + 1," ## [76] " label = round(null, 2)" ## [77] " )" ## [78] " " ## [79] "}" ## [80] "" ## [81] "# Proposed layer composition" ## [82] "compute_dnorm_prop <- function(data, scales, null = .5, dist_sds = seq(-3.5, 3.5, by = .1)" ## [83] "){" ## [84] " " ## [85] " n <- nrow(data)" ## [86] " " ## [87] " sd = sqrt(null * (1 - null)/n) # sd of the null distribution" ## [88] " " ## [89] " q <- dist_sds * sd + null" ## [90] " " ## [91] " data.frame(x = q + 1) %>%" ## [92] " mutate(height = dnorm(q, sd = sd, mean = null)) %>%" ## [93] " mutate(height_max = dnorm(0, sd = sd, mean = 0)) %>%" ## [94] " mutate(y = .35*n*height/height_max) %>% # This is a bit fragile..." ## [95] " mutate(xend = x," ## [96] " yend = 0) %>% " ## [97] " # @teunbrand GeomArea$setup_data() requires a group column. Your panel computation does not preserve groups, but it should." ## [98] " mutate(group = 1) " ## [99] " " ## [100] "} " ## [101] "" ## [102] "" ## [103] "# Proposed layer composition" ## [104] "compute_dnorm_prop_sds <- function(data, scales, null = .5," ## [105] " dist_sds = -4:4){" ## [106] " " ## [107] " n <- nrow(data)" ## [108] " " ## [109] " sd = sqrt(null * (1 - null)/n) # sd of the null distribution" ## [110] " " ## [111] " q <- dist_sds * sd + null" ## [112] " " ## [113] " data.frame(x = q + 1) %>%" ## [114] " mutate(height = dnorm(q, sd = sd, mean = null)) %>%" ## [115] " mutate(height_max = dnorm(0, sd = sd, mean = 0)) %>%" ## [116] " mutate(y = .35*n*height/height_max) %>% # This is a bit fragile..." ## [117] " mutate(xend = x," ## [118] " yend = 0)" ## [119] "" ## [120] "} " ## [121] "" ## [122] "" ## [123] "" ## [124] "" ``` ``` r knitrExtra::chunk_to_dir("compute_for_prop_story", dir = ".") ``` --- *Actual* graphical-poem-first approach... 9 new layers in my statistical poem. 400 scaffolding characters per plot element? --- 'like everyone has like other life going on so figuring out how you can help those people like save time ... like maintaining the R package is like at the top of most people's priority list or writing the best possible R code --- Acknowledgements - ggplot2 extenders club, esp. June Choe & Teun van den Brand -- - MA206 people -- - stat_manual() group-wise computation -- - stat_rasa() Elio C. 'Demasiado complicado'... --- "compute_for_prop_story" --- ``` r knitr::knit_exit() ```