class: inverse, bottom background-image: url(https://images.unsplash.com/photo-1559827260-dc66d52bef19?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1470&q=80) background-size: cover # .Large[Supporting a new wave of ggplot2 extenders] ## .small[New points of entry for ggplot2 super-users] ## .small[https://bit.ly/ggextend-cowy] #### .tiny[Dr. Evangeline Reynolds | 2023-10-13 | ASA, COWY, Image credit: Silas Baisch, Upsplash] ??? Title slide --- class: inverse, bottom background-image: url(https://images.unsplash.com/photo-1603300216540-6b77f84ee101?ixlib=rb-1.2.1&raw_url=true&q=80&fm=jpg&crop=entropy&cs=tinysrgb&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=387) background-size: cover # Unlocking ggplot2 as a computational engine ## by extending ggplot2 statistical geometries ### Dr. Evangeline 'Gina' Reynolds ### Thursday May 19, 2022, 12:15AM #### Photo Credit: Mike Lewis HeadSmart Media --- class: inverse, bottom background-image: url(https://images.unsplash.com/photo-1542178432-52211bc52073?crop=entropy&cs=tinysrgb&fm=jpg&ixlib=rb-1.2.1&q=80&raw_url=true&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1374) background-size: cover # Creating new geom vocabulary for richer statistical visual stories ## by extending ggplot2 statistical geometries ### Dr. Evangeline 'Gina' Reynolds ### Thursday May 19, 2022, 12:15AM #### Photo Credit: Klim Sergeev --- ### 1. Why ggplot2 ### 2. Limitations of base ggplot2 ### 3. Promise of extension ### 4. ggplot2 super-users not leveraging extension ### 5. New super-user-tailored educational material ### 6. Challenges on the horizon --- class: middle, inverse, center # Part 1. So ggplot2... -- # ... what's the fuss? <!-- --- --> <!-- # gg in ggplot2 --> <!-- -- --> <!-- # Grammar of Graphics --> --- class: middle, inverse, center # It lets you *'speak your plot into existence'*. (Thomas Lin Pederson) --- # Hans Rosling & BBC in 2010 <iframe width="767" height="431" src="https://www.youtube.com/embed/jbkSRLYSojo?list=PL6F8D7054D12E7C5A" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> https://www.youtube.com/embed/jbkSRLYSojo?list=PL6F8D7054D12E7C5A --- ## ... I know having the data is not enough. I have to show it in ways people both enjoy and understand <img src="images/hans_argument.png" width="65%" style="display: block; margin: auto;" /> --- # 'Here we go. Life expectancy on the y-axis' <img src="images/hans_y_axis.png" width="80%" /> --- # 'On the x-axis, wealth' <img src="images/hans_x_axis.png" width="80%" /> --- # 'Colors represent the different continents' <img src="images/hans_colors.png" width="80%" /> --- # 'Size represents population' <img src="images/hans_size.png" width="80%" /> --- class: inverse, center, middle # Response to speaking plot into existence? -- # 10 million views... -- (also does animation at the end... which I don't show) --- class: inverse, center, middle # And ggplot2 also affords this build-up-your-plot-bit-by-bit experience. -- # i.e. 'speaking your plot into existence.' --- count: false ## Now, we all have Rosling capabilities with ggplot2 .panel1-scatter-auto[ ```r *ggplot(data = gapminder_2002) ``` ] .panel2-scatter-auto[ <img src="cowy-slides_files/figure-html/scatter_auto_01_output-1.png" width="432" /> ] --- count: false ## Now, we all have Rosling capabilities with ggplot2 .panel1-scatter-auto[ ```r ggplot(data = gapminder_2002) + * aes(y = lifeExp) ``` ] .panel2-scatter-auto[ <img src="cowy-slides_files/figure-html/scatter_auto_02_output-1.png" width="432" /> ] --- count: false ## Now, we all have Rosling capabilities with ggplot2 .panel1-scatter-auto[ ```r ggplot(data = gapminder_2002) + aes(y = lifeExp) + * aes(x = gdpPercap) ``` ] .panel2-scatter-auto[ <img src="cowy-slides_files/figure-html/scatter_auto_03_output-1.png" width="432" /> ] --- count: false ## Now, we all have Rosling capabilities with ggplot2 .panel1-scatter-auto[ ```r ggplot(data = gapminder_2002) + aes(y = lifeExp) + aes(x = gdpPercap) + * geom_point() ``` ] .panel2-scatter-auto[ <img src="cowy-slides_files/figure-html/scatter_auto_04_output-1.png" width="432" /> ] --- count: false ## Now, we all have Rosling capabilities with ggplot2 .panel1-scatter-auto[ ```r ggplot(data = gapminder_2002) + aes(y = lifeExp) + aes(x = gdpPercap) + geom_point() + * aes(size = pop/1000000000) ``` ] .panel2-scatter-auto[ <img src="cowy-slides_files/figure-html/scatter_auto_05_output-1.png" width="432" /> ] --- count: false ## Now, we all have Rosling capabilities with ggplot2 .panel1-scatter-auto[ ```r ggplot(data = gapminder_2002) + aes(y = lifeExp) + aes(x = gdpPercap) + geom_point() + aes(size = pop/1000000000) + * aes(color = continent) ``` ] .panel2-scatter-auto[ <img src="cowy-slides_files/figure-html/scatter_auto_06_output-1.png" width="432" /> ] <style> .panel1-scatter-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-scatter-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-scatter-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ## We get the 'Hans Rosling' speak-it experience ... -- ## ... because ggplot2 is based on the grammar of graphics, (gg = grammar of graphics) ... -- ## ... which breaks up the plot elements into *orthogonal* parts ... -- ## ... and allows user to manipulate those parts bit by bit. --- class: inverse, center, middle > # "the Grammar of Graphics makes [building plots] easy because you've just got all these, like, little nice decomposable components" -- Hadley Wickham --- ## Leland Wilkinson identified orthogonal elements in 'The Grammar of Graphics'. Wilkinson on the Policy Viz Podcast [here](https://policyviz.com/podcast/episode-201-leland-wilkinson/). <img src="https://miro.medium.com/max/1400/1*MMZuYgeC_YjXNC1r4D4sog.png" width="75%" /> --- # Skip at COWY: discussing origins of ggplot2 -- Hadley Wickham, ggplot2 author on it's motivation: > ### And, you know, I'd get a dataset. And, *in my head I could very clearly kind of picture*, I want to put this on the x-axis. Let's put this on the y-axis, draw a line, put some points here, break it up by this variable. -- > ### And then, like, getting that vision out of my head, and into reality, it's just really, really hard. Just, like, felt harder than it should be. Like, there's a lot of custom programming involved, --- > ### where I just felt, like, to me, I just wanted to say, like, you know, *this is what I'm thinking, this is how I'm picturing this plot. Like you're the computer 'Go and do it'.* -- > ### ... and I'd also been reading about the Grammar of Graphics by Leland Wilkinson, I got to meet him a couple of times and ... I was, like, this book has been, like, written for me. https://www.trifacta.com/podcast/tidy-data-with-hadley-wickham/ --- count: false # And ggplot2 gives us even more than Rosling... .panel1-scatter2-1[ ```r ggplot(data = gapminder_2002) + aes(y = lifeExp) + aes(x = gdpPercap) + scale_x_log10() + geom_point() + aes(size = pop/1000000000) + aes(color = continent) + labs(x = "GDP per capita (US$)") + * geom_smooth(method = lm) ``` ] .panel2-scatter2-1[ <img src="cowy-slides_files/figure-html/scatter2_1_01_output-1.png" width="432" /> ] <style> .panel1-scatter2-1 { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-scatter2-1 { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-scatter2-1 { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center # ggplot2 let us "... create graphical 'poems'." - Hadley Wickham (2010) in 'A Layered Grammar of Graphics', *Journal of Computational and Graphical Statistics* -- # But... sometimes there are limits to what we can express --- class: inverse, center, middle # Part 2. The problem -- ## Sometimes we don't have the vocabulary that's needed to 'speak' fluently. --- ## Example: What's the graphical poem here? <img src="cowy-slides_files/figure-html/unnamed-chunk-10-1.png" width="432" /> ??? Consider for example, a the seemingly simple enterprise of adding a vertical line at the mean of x, perhaps atop a histogram or density plot. --- count: false ### What's the *base* ggplot2 experience .panel1-basic-auto[ ```r *airquality ``` ] .panel2-basic-auto[ ``` Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 66 5 6 7 23 299 8.6 65 5 7 8 19 99 13.8 59 5 8 9 8 19 20.1 61 5 9 10 NA 194 8.6 69 5 10 11 7 NA 6.9 74 5 11 12 16 256 9.7 69 5 12 13 11 290 9.2 66 5 13 14 14 274 10.9 68 5 14 15 18 65 13.2 58 5 15 16 14 334 11.5 64 5 16 17 34 307 12.0 66 5 17 18 6 78 18.4 57 5 18 19 30 322 11.5 68 5 19 20 11 44 9.7 62 5 20 21 1 8 9.7 59 5 21 22 11 320 16.6 73 5 22 23 4 25 9.7 61 5 23 24 32 92 12.0 61 5 24 25 NA 66 16.6 57 5 25 26 NA 266 14.9 58 5 26 27 NA NA 8.0 57 5 27 28 23 13 12.0 67 5 28 29 45 252 14.9 81 5 29 30 115 223 5.7 79 5 30 31 37 279 7.4 76 5 31 32 NA 286 8.6 78 6 1 33 NA 287 9.7 74 6 2 34 NA 242 16.1 67 6 3 35 NA 186 9.2 84 6 4 36 NA 220 8.6 85 6 5 37 NA 264 14.3 79 6 6 38 29 127 9.7 82 6 7 39 NA 273 6.9 87 6 8 40 71 291 13.8 90 6 9 41 39 323 11.5 87 6 10 42 NA 259 10.9 93 6 11 43 NA 250 9.2 92 6 12 44 23 148 8.0 82 6 13 45 NA 332 13.8 80 6 14 46 NA 322 11.5 79 6 15 47 21 191 14.9 77 6 16 48 37 284 20.7 72 6 17 49 20 37 9.2 65 6 18 50 12 120 11.5 73 6 19 51 13 137 10.3 76 6 20 52 NA 150 6.3 77 6 21 53 NA 59 1.7 76 6 22 54 NA 91 4.6 76 6 23 55 NA 250 6.3 76 6 24 56 NA 135 8.0 75 6 25 57 NA 127 8.0 78 6 26 58 NA 47 10.3 73 6 27 59 NA 98 11.5 80 6 28 60 NA 31 14.9 77 6 29 61 NA 138 8.0 83 6 30 62 135 269 4.1 84 7 1 63 49 248 9.2 85 7 2 64 32 236 9.2 81 7 3 65 NA 101 10.9 84 7 4 66 64 175 4.6 83 7 5 67 40 314 10.9 83 7 6 68 77 276 5.1 88 7 7 69 97 267 6.3 92 7 8 70 97 272 5.7 92 7 9 71 85 175 7.4 89 7 10 72 NA 139 8.6 82 7 11 73 10 264 14.3 73 7 12 74 27 175 14.9 81 7 13 75 NA 291 14.9 91 7 14 76 7 48 14.3 80 7 15 77 48 260 6.9 81 7 16 78 35 274 10.3 82 7 17 79 61 285 6.3 84 7 18 80 79 187 5.1 87 7 19 81 63 220 11.5 85 7 20 82 16 7 6.9 74 7 21 83 NA 258 9.7 81 7 22 84 NA 295 11.5 82 7 23 85 80 294 8.6 86 7 24 86 108 223 8.0 85 7 25 87 20 81 8.6 82 7 26 88 52 82 12.0 86 7 27 89 82 213 7.4 88 7 28 90 50 275 7.4 86 7 29 91 64 253 7.4 83 7 30 92 59 254 9.2 81 7 31 93 39 83 6.9 81 8 1 94 9 24 13.8 81 8 2 95 16 77 7.4 82 8 3 96 78 NA 6.9 86 8 4 97 35 NA 7.4 85 8 5 98 66 NA 4.6 87 8 6 99 122 255 4.0 89 8 7 100 89 229 10.3 90 8 8 101 110 207 8.0 90 8 9 102 NA 222 8.6 92 8 10 103 NA 137 11.5 86 8 11 104 44 192 11.5 86 8 12 105 28 273 11.5 82 8 13 106 65 157 9.7 80 8 14 107 NA 64 11.5 79 8 15 108 22 71 10.3 77 8 16 109 59 51 6.3 79 8 17 110 23 115 7.4 76 8 18 111 31 244 10.9 78 8 19 112 44 190 10.3 78 8 20 113 21 259 15.5 77 8 21 114 9 36 14.3 72 8 22 115 NA 255 12.6 75 8 23 116 45 212 9.7 79 8 24 117 168 238 3.4 81 8 25 118 73 215 8.0 86 8 26 119 NA 153 5.7 88 8 27 120 76 203 9.7 97 8 28 121 118 225 2.3 94 8 29 122 84 237 6.3 96 8 30 123 85 188 6.3 94 8 31 124 96 167 6.9 91 9 1 125 78 197 5.1 92 9 2 126 73 183 2.8 93 9 3 127 91 189 4.6 93 9 4 128 47 95 7.4 87 9 5 129 32 92 15.5 84 9 6 130 20 252 10.9 80 9 7 131 23 220 10.3 78 9 8 132 21 230 10.9 75 9 9 133 24 259 9.7 73 9 10 134 44 236 14.9 81 9 11 135 21 259 15.5 76 9 12 136 28 238 6.3 77 9 13 137 9 24 10.9 71 9 14 138 13 112 11.5 71 9 15 139 46 237 6.9 78 9 16 140 18 224 13.8 67 9 17 141 13 27 10.3 76 9 18 142 24 238 10.3 68 9 19 143 16 201 8.0 82 9 20 144 13 238 12.6 64 9 21 145 23 14 9.2 71 9 22 146 36 139 10.3 81 9 23 147 7 49 10.3 69 9 24 148 14 20 16.6 63 9 25 149 30 193 6.9 70 9 26 150 NA 145 13.2 77 9 27 151 14 191 14.3 75 9 28 152 18 131 8.0 76 9 29 153 20 223 11.5 68 9 30 ``` ] --- count: false ### What's the *base* ggplot2 experience .panel1-basic-auto[ ```r airquality %>% * ggplot(data = .) ``` ] .panel2-basic-auto[ <img src="cowy-slides_files/figure-html/basic_auto_02_output-1.png" width="432" /> ] --- count: false ### What's the *base* ggplot2 experience .panel1-basic-auto[ ```r airquality %>% ggplot(data = .) + * aes(x = Ozone) ``` ] .panel2-basic-auto[ <img src="cowy-slides_files/figure-html/basic_auto_03_output-1.png" width="432" /> ] --- count: false ### What's the *base* ggplot2 experience .panel1-basic-auto[ ```r airquality %>% ggplot(data = .) + aes(x = Ozone) + * geom_rug() ``` ] .panel2-basic-auto[ <img src="cowy-slides_files/figure-html/basic_auto_04_output-1.png" width="432" /> ] --- count: false ### What's the *base* ggplot2 experience .panel1-basic-auto[ ```r airquality %>% ggplot(data = .) + aes(x = Ozone) + geom_rug() + * geom_histogram() ``` ] .panel2-basic-auto[ <img src="cowy-slides_files/figure-html/basic_auto_05_output-1.png" width="432" /> ] --- count: false ### What's the *base* ggplot2 experience .panel1-basic-auto[ ```r airquality %>% ggplot(data = .) + aes(x = Ozone) + geom_rug() + geom_histogram() + * geom_vline( * xintercept = * mean(airquality$Ozone, * na.rm = T) * ) ``` ] .panel2-basic-auto[ <img src="cowy-slides_files/figure-html/basic_auto_06_output-1.png" width="432" /> ] --- count: false ### What's the *base* ggplot2 experience .panel1-basic-auto[ ```r airquality %>% ggplot(data = .) + aes(x = Ozone) + geom_rug() + geom_histogram() + geom_vline( xintercept = mean(airquality$Ozone, na.rm = T) ) -> *g ``` ] .panel2-basic-auto[ ] <style> .panel1-basic-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-basic-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-basic-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> ??? Creating this plot requires greater focus on ggplot2 *syntax*, likely detracting from discussion of *the mean* that statistical instructors desire. It may require a discussion about dollar sign syntax and how geom_vline is actually a special geom -- an annotation -- rather than being mapped to the data. None of this is relevant to the point you as an instructor aim to make: maybe that the the mean is the balancing point of the data or maybe a comment about skewness. --- ### Adding the conditional means? <img src="cowy-slides_files/figure-html/cond_means_hard-1.png" width="432" /> ??? Further, for the case of adding a vertical line at the mean for different subsets of the data, a different approach is required. This enterprise may take instructor/analyst/student on an even larger detour -- possibly googling, and maybe landing on the following stack overflow page where 11,000 analytics souls (some repeats to be sure) have landed: --- count: false #### Conditional means (may require a trip to stackoverflow!) .panel1-cond_means_hard-auto[ ```r *airquality ``` ] .panel2-cond_means_hard-auto[ ``` Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 66 5 6 7 23 299 8.6 65 5 7 8 19 99 13.8 59 5 8 9 8 19 20.1 61 5 9 10 NA 194 8.6 69 5 10 11 7 NA 6.9 74 5 11 12 16 256 9.7 69 5 12 13 11 290 9.2 66 5 13 14 14 274 10.9 68 5 14 15 18 65 13.2 58 5 15 16 14 334 11.5 64 5 16 17 34 307 12.0 66 5 17 18 6 78 18.4 57 5 18 19 30 322 11.5 68 5 19 20 11 44 9.7 62 5 20 21 1 8 9.7 59 5 21 22 11 320 16.6 73 5 22 23 4 25 9.7 61 5 23 24 32 92 12.0 61 5 24 25 NA 66 16.6 57 5 25 26 NA 266 14.9 58 5 26 27 NA NA 8.0 57 5 27 28 23 13 12.0 67 5 28 29 45 252 14.9 81 5 29 30 115 223 5.7 79 5 30 31 37 279 7.4 76 5 31 32 NA 286 8.6 78 6 1 33 NA 287 9.7 74 6 2 34 NA 242 16.1 67 6 3 35 NA 186 9.2 84 6 4 36 NA 220 8.6 85 6 5 37 NA 264 14.3 79 6 6 38 29 127 9.7 82 6 7 39 NA 273 6.9 87 6 8 40 71 291 13.8 90 6 9 41 39 323 11.5 87 6 10 42 NA 259 10.9 93 6 11 43 NA 250 9.2 92 6 12 44 23 148 8.0 82 6 13 45 NA 332 13.8 80 6 14 46 NA 322 11.5 79 6 15 47 21 191 14.9 77 6 16 48 37 284 20.7 72 6 17 49 20 37 9.2 65 6 18 50 12 120 11.5 73 6 19 51 13 137 10.3 76 6 20 52 NA 150 6.3 77 6 21 53 NA 59 1.7 76 6 22 54 NA 91 4.6 76 6 23 55 NA 250 6.3 76 6 24 56 NA 135 8.0 75 6 25 57 NA 127 8.0 78 6 26 58 NA 47 10.3 73 6 27 59 NA 98 11.5 80 6 28 60 NA 31 14.9 77 6 29 61 NA 138 8.0 83 6 30 62 135 269 4.1 84 7 1 63 49 248 9.2 85 7 2 64 32 236 9.2 81 7 3 65 NA 101 10.9 84 7 4 66 64 175 4.6 83 7 5 67 40 314 10.9 83 7 6 68 77 276 5.1 88 7 7 69 97 267 6.3 92 7 8 70 97 272 5.7 92 7 9 71 85 175 7.4 89 7 10 72 NA 139 8.6 82 7 11 73 10 264 14.3 73 7 12 74 27 175 14.9 81 7 13 75 NA 291 14.9 91 7 14 76 7 48 14.3 80 7 15 77 48 260 6.9 81 7 16 78 35 274 10.3 82 7 17 79 61 285 6.3 84 7 18 80 79 187 5.1 87 7 19 81 63 220 11.5 85 7 20 82 16 7 6.9 74 7 21 83 NA 258 9.7 81 7 22 84 NA 295 11.5 82 7 23 85 80 294 8.6 86 7 24 86 108 223 8.0 85 7 25 87 20 81 8.6 82 7 26 88 52 82 12.0 86 7 27 89 82 213 7.4 88 7 28 90 50 275 7.4 86 7 29 91 64 253 7.4 83 7 30 92 59 254 9.2 81 7 31 93 39 83 6.9 81 8 1 94 9 24 13.8 81 8 2 95 16 77 7.4 82 8 3 96 78 NA 6.9 86 8 4 97 35 NA 7.4 85 8 5 98 66 NA 4.6 87 8 6 99 122 255 4.0 89 8 7 100 89 229 10.3 90 8 8 101 110 207 8.0 90 8 9 102 NA 222 8.6 92 8 10 103 NA 137 11.5 86 8 11 104 44 192 11.5 86 8 12 105 28 273 11.5 82 8 13 106 65 157 9.7 80 8 14 107 NA 64 11.5 79 8 15 108 22 71 10.3 77 8 16 109 59 51 6.3 79 8 17 110 23 115 7.4 76 8 18 111 31 244 10.9 78 8 19 112 44 190 10.3 78 8 20 113 21 259 15.5 77 8 21 114 9 36 14.3 72 8 22 115 NA 255 12.6 75 8 23 116 45 212 9.7 79 8 24 117 168 238 3.4 81 8 25 118 73 215 8.0 86 8 26 119 NA 153 5.7 88 8 27 120 76 203 9.7 97 8 28 121 118 225 2.3 94 8 29 122 84 237 6.3 96 8 30 123 85 188 6.3 94 8 31 124 96 167 6.9 91 9 1 125 78 197 5.1 92 9 2 126 73 183 2.8 93 9 3 127 91 189 4.6 93 9 4 128 47 95 7.4 87 9 5 129 32 92 15.5 84 9 6 130 20 252 10.9 80 9 7 131 23 220 10.3 78 9 8 132 21 230 10.9 75 9 9 133 24 259 9.7 73 9 10 134 44 236 14.9 81 9 11 135 21 259 15.5 76 9 12 136 28 238 6.3 77 9 13 137 9 24 10.9 71 9 14 138 13 112 11.5 71 9 15 139 46 237 6.9 78 9 16 140 18 224 13.8 67 9 17 141 13 27 10.3 76 9 18 142 24 238 10.3 68 9 19 143 16 201 8.0 82 9 20 144 13 238 12.6 64 9 21 145 23 14 9.2 71 9 22 146 36 139 10.3 81 9 23 147 7 49 10.3 69 9 24 148 14 20 16.6 63 9 25 149 30 193 6.9 70 9 26 150 NA 145 13.2 77 9 27 151 14 191 14.3 75 9 28 152 18 131 8.0 76 9 29 153 20 223 11.5 68 9 30 ``` ] --- count: false #### Conditional means (may require a trip to stackoverflow!) .panel1-cond_means_hard-auto[ ```r airquality %>% * group_by(Month) ``` ] .panel2-cond_means_hard-auto[ ``` # A tibble: 153 × 6 # Groups: Month [5] Ozone Solar.R Wind Temp Month Day <int> <int> <dbl> <int> <int> <int> 1 41 190 7.4 67 5 1 2 36 118 8 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 66 5 6 7 23 299 8.6 65 5 7 8 19 99 13.8 59 5 8 9 8 19 20.1 61 5 9 10 NA 194 8.6 69 5 10 # … with 143 more rows ``` ] --- count: false #### Conditional means (may require a trip to stackoverflow!) .panel1-cond_means_hard-auto[ ```r airquality %>% group_by(Month) %>% * summarise( * Ozone_mean = * mean(Ozone, na.rm = T) * ) ``` ] .panel2-cond_means_hard-auto[ ``` # A tibble: 5 × 2 Month Ozone_mean <int> <dbl> 1 5 23.6 2 6 29.4 3 7 59.1 4 8 60.0 5 9 31.4 ``` ] --- count: false #### Conditional means (may require a trip to stackoverflow!) .panel1-cond_means_hard-auto[ ```r airquality %>% group_by(Month) %>% summarise( Ozone_mean = mean(Ozone, na.rm = T) ) -> *airquality_by_month ``` ] .panel2-cond_means_hard-auto[ ] --- count: false #### Conditional means (may require a trip to stackoverflow!) .panel1-cond_means_hard-auto[ ```r airquality %>% group_by(Month) %>% summarise( Ozone_mean = mean(Ozone, na.rm = T) ) -> airquality_by_month *ggplot(airquality) ``` ] .panel2-cond_means_hard-auto[ <img src="cowy-slides_files/figure-html/cond_means_hard_auto_05_output-1.png" width="432" /> ] --- count: false #### Conditional means (may require a trip to stackoverflow!) .panel1-cond_means_hard-auto[ ```r airquality %>% group_by(Month) %>% summarise( Ozone_mean = mean(Ozone, na.rm = T) ) -> airquality_by_month ggplot(airquality) + * aes(x = Ozone) ``` ] .panel2-cond_means_hard-auto[ <img src="cowy-slides_files/figure-html/cond_means_hard_auto_06_output-1.png" width="432" /> ] --- count: false #### Conditional means (may require a trip to stackoverflow!) .panel1-cond_means_hard-auto[ ```r airquality %>% group_by(Month) %>% summarise( Ozone_mean = mean(Ozone, na.rm = T) ) -> airquality_by_month ggplot(airquality) + aes(x = Ozone) + * geom_histogram() ``` ] .panel2-cond_means_hard-auto[ <img src="cowy-slides_files/figure-html/cond_means_hard_auto_07_output-1.png" width="432" /> ] --- count: false #### Conditional means (may require a trip to stackoverflow!) .panel1-cond_means_hard-auto[ ```r airquality %>% group_by(Month) %>% summarise( Ozone_mean = mean(Ozone, na.rm = T) ) -> airquality_by_month ggplot(airquality) + aes(x = Ozone) + geom_histogram() + * facet_grid(rows = vars(Month)) ``` ] .panel2-cond_means_hard-auto[ <img src="cowy-slides_files/figure-html/cond_means_hard_auto_08_output-1.png" width="432" /> ] --- count: false #### Conditional means (may require a trip to stackoverflow!) .panel1-cond_means_hard-auto[ ```r airquality %>% group_by(Month) %>% summarise( Ozone_mean = mean(Ozone, na.rm = T) ) -> airquality_by_month ggplot(airquality) + aes(x = Ozone) + geom_histogram() + facet_grid(rows = vars(Month)) + * geom_vline(data = airquality_by_month, * aes(xintercept = * Ozone_mean)) ``` ] .panel2-cond_means_hard-auto[ <img src="cowy-slides_files/figure-html/cond_means_hard_auto_09_output-1.png" width="432" /> ] <style> .panel1-cond_means_hard-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-cond_means_hard-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-cond_means_hard-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Why is geom_smooth() is so easy, and my mean of x line is so hard? -- ### Like geom_smooth does a bunch of compute in the background -- ### then uses geometric primitives to show computed values - line - ribbon geometry -- ### geom_x_mean() doesn't exist; we have to do the compute --- ## ggplot2 intentionally lean for maintenance purposes. -- ## ... but geom_x_mean() would be so nice! --- class: middle, center, inverse ## Part 3. ggplot2 extensions can be used to help restore that 'speak it into existance' feel, when base ggplot2 'words' fail --- ## Examples from ggxmean https://github.com/EvaMaeRey/ggxmean, that gives us geom_x_mean(), and ~ 25 more geoms... --- count: false .panel1-ols-auto[ ```r *library(tidyverse) ``` ] .panel2-ols-auto[ ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) *library(ggxmean) ``` ] .panel2-ols-auto[ ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) *#library(transformr) might help w/ animate #library(transformr) might help w/ animate ``` ] .panel2-ols-auto[ ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) #library(transformr) might help w/ animate #library(transformr) might help w/ animate ## basic example code *cars ``` ] .panel2-ols-auto[ ``` speed dist 1 4 2 2 4 10 3 7 4 4 7 22 5 8 16 6 9 10 7 10 18 8 10 26 9 10 34 10 11 17 11 11 28 12 12 14 13 12 20 14 12 24 15 12 28 16 13 26 17 13 34 18 13 34 19 13 46 20 14 26 21 14 36 22 14 60 23 14 80 24 15 20 25 15 26 26 15 54 27 16 32 28 16 40 29 17 32 30 17 40 31 17 50 32 18 42 33 18 56 34 18 76 35 18 84 36 19 36 37 19 46 38 19 68 39 20 32 40 20 48 41 20 52 42 20 56 43 20 64 44 22 66 45 23 54 46 24 70 47 24 92 48 24 93 49 24 120 50 25 85 ``` ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) #library(transformr) might help w/ animate #library(transformr) might help w/ animate ## basic example code cars %>% * ggplot() ``` ] .panel2-ols-auto[ <img src="cowy-slides_files/figure-html/ols_auto_05_output-1.png" width="432" /> ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) #library(transformr) might help w/ animate #library(transformr) might help w/ animate ## basic example code cars %>% ggplot() + * aes(x = speed, * y = dist) ``` ] .panel2-ols-auto[ <img src="cowy-slides_files/figure-html/ols_auto_06_output-1.png" width="432" /> ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) #library(transformr) might help w/ animate #library(transformr) might help w/ animate ## basic example code cars %>% ggplot() + aes(x = speed, y = dist) + * geom_point() ``` ] .panel2-ols-auto[ <img src="cowy-slides_files/figure-html/ols_auto_07_output-1.png" width="432" /> ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) #library(transformr) might help w/ animate #library(transformr) might help w/ animate ## basic example code cars %>% ggplot() + aes(x = speed, y = dist) + geom_point() + * ggxmean::geom_lm() ``` ] .panel2-ols-auto[ <img src="cowy-slides_files/figure-html/ols_auto_08_output-1.png" width="432" /> ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) #library(transformr) might help w/ animate #library(transformr) might help w/ animate ## basic example code cars %>% ggplot() + aes(x = speed, y = dist) + geom_point() + ggxmean::geom_lm() + * ggxmean::geom_lm_residuals(linetype = "dashed") ``` ] .panel2-ols-auto[ <img src="cowy-slides_files/figure-html/ols_auto_09_output-1.png" width="432" /> ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) #library(transformr) might help w/ animate #library(transformr) might help w/ animate ## basic example code cars %>% ggplot() + aes(x = speed, y = dist) + geom_point() + ggxmean::geom_lm() + ggxmean::geom_lm_residuals(linetype = "dashed") + * ggxmean::geom_lm_fitted( * color = "goldenrod3", * size = 3) ``` ] .panel2-ols-auto[ <img src="cowy-slides_files/figure-html/ols_auto_10_output-1.png" width="432" /> ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) #library(transformr) might help w/ animate #library(transformr) might help w/ animate ## basic example code cars %>% ggplot() + aes(x = speed, y = dist) + geom_point() + ggxmean::geom_lm() + ggxmean::geom_lm_residuals(linetype = "dashed") + ggxmean::geom_lm_fitted( color = "goldenrod3", size = 3) + * ggxmean::geom_lm_conf_int() ``` ] .panel2-ols-auto[ <img src="cowy-slides_files/figure-html/ols_auto_11_output-1.png" width="432" /> ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) #library(transformr) might help w/ animate #library(transformr) might help w/ animate ## basic example code cars %>% ggplot() + aes(x = speed, y = dist) + geom_point() + ggxmean::geom_lm() + ggxmean::geom_lm_residuals(linetype = "dashed") + ggxmean::geom_lm_fitted( color = "goldenrod3", size = 3) + ggxmean::geom_lm_conf_int() + * ggxmean::geom_lm_pred_int() ``` ] .panel2-ols-auto[ <img src="cowy-slides_files/figure-html/ols_auto_12_output-1.png" width="432" /> ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) #library(transformr) might help w/ animate #library(transformr) might help w/ animate ## basic example code cars %>% ggplot() + aes(x = speed, y = dist) + geom_point() + ggxmean::geom_lm() + ggxmean::geom_lm_residuals(linetype = "dashed") + ggxmean::geom_lm_fitted( color = "goldenrod3", size = 3) + ggxmean::geom_lm_conf_int() + ggxmean::geom_lm_pred_int() + * ggxmean::geom_lm_formula() ``` ] .panel2-ols-auto[ <img src="cowy-slides_files/figure-html/ols_auto_13_output-1.png" width="432" /> ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) #library(transformr) might help w/ animate #library(transformr) might help w/ animate ## basic example code cars %>% ggplot() + aes(x = speed, y = dist) + geom_point() + ggxmean::geom_lm() + ggxmean::geom_lm_residuals(linetype = "dashed") + ggxmean::geom_lm_fitted( color = "goldenrod3", size = 3) + ggxmean::geom_lm_conf_int() + ggxmean::geom_lm_pred_int() + ggxmean::geom_lm_formula() + * ggxmean::geom_lm_intercept(color = "red", * size = 5) ``` ] .panel2-ols-auto[ <img src="cowy-slides_files/figure-html/ols_auto_14_output-1.png" width="432" /> ] --- count: false .panel1-ols-auto[ ```r library(tidyverse) library(ggxmean) #library(transformr) might help w/ animate #library(transformr) might help w/ animate ## basic example code cars %>% ggplot() + aes(x = speed, y = dist) + geom_point() + ggxmean::geom_lm() + ggxmean::geom_lm_residuals(linetype = "dashed") + ggxmean::geom_lm_fitted( color = "goldenrod3", size = 3) + ggxmean::geom_lm_conf_int() + ggxmean::geom_lm_pred_int() + ggxmean::geom_lm_formula() + ggxmean::geom_lm_intercept(color = "red", size = 5) + * ggxmean::geom_lm_intercept_label( * size = 3, * hjust = 0) ``` ] .panel2-ols-auto[ <img src="cowy-slides_files/figure-html/ols_auto_15_output-1.png" width="432" /> ] <style> .panel1-ols-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-ols-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-ols-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Thesis of the talk: -- # Extension system is really cool. -- # Really powerful. -- # Staggeringly under-utilized, -- because existing points of entry are challenging. --- class: inverse, center, middle # Part 4. Underutilized extension mechanism. --- ## Under-utilized by whom? -- Who could find it extremely useful and could flourish in this extension space? -- ## ggplot2 super users... ## ## - know the grammar. ## - know how to write functions. ## - know R/ggplot2 extremely well. --- # Evidence from a micro survey and focus group -- # Nine statistics and data analytics educators. April 2023. -- # ~ 15 question survey and 2 focus groups. --- class: inverse, middle, center # profile of survey/focus group: -- # 100% R/ggplot stars! --- <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/q05r_length_user-1.png" width="90%" /> --- <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/q06r_frequency-1.png" width="90%" /> --- <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/q07ggplot2_frequency-1.png" width="90%" /> --- <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/unnamed-chunk-7-1.png" width="90%" /> --- <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/q08r_frequency-1.png" width="90%" /> --- class: inverse, middle, center # And yet, -- # *minimal* experience with extension. --- <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/q10_previous_ext_experience-1.png" width="90%" /> --- <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/q11_previous_ext_attempt-1.png" width="90%" /> --- # What's the barrier? --- ## Why super-user's aren't extending (quoting ggtrace author*)... -- ### 0) 'users aren't developers' -- ### 1) 'scary ggproto methods' (less experience with OOP) -- ### 2) hard to reason about the big chunk of codes that defined a new ggroto object -- <!-- ### 3) even if we master all of this, then we have to become package developers? --> -- * *June Choe is a ggplot2 enthusiast and winner of ASA Statistical Computing and Graphics student paper award: https://yjunechoe.github.io/static/papers/Choe_2022_SublayerGG.pdf -- * June's thesis: users can use ggplot2 more powerfully and flexibly even staying on the *user side* with greater knowledge of delayed aesthetic evaluation. ggtrace serves to help 'crack open ggplot2 internals' to help w/ current knowledge gap about computational that's already taking place can be accessed and used in visualizations. --- <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/q13_oop_experience-1.png" width="80%" /> --- ### 0) Users are leaving too much on the table not to get into ggplot2 extensions - foregoing super-fluid ggplot2 experiences that extension affords -- ### 1) deemphasize OOP ggproto methods -- ### 2) pull computation out of ggproto method, to facilitate reasoning, debugging -- ### 3) approaching from practice as vehicle rather than theory as vehicle to get you there. --- class: inverse, middle, center # Part 5. New points of entry tailor made for ggplot2 super-users who are not yet in extension --- ## - *Easy geom recipes: practice with `compute_group`* ### Link to html: https://evamaerey.github.io/mytidytuesday/2022-01-03-easy-geom-recipes/easy_geom_recipes.html ### Link to rmarkdown file: https://raw.githubusercontent.com/evamaerey/mytidytuesday/master/2022-01-03-easy-geom-recipes/easy_geom_recipes.Rmd ### Link to slideshow overview: https://evamaerey.github.io/mytidytuesday/2022-01-03-easy-geom-recipes/easy_geom_recipes_flipbook.html ## - *More easy geom recipes: practice with `compute_panel`* (Under construction) --- # Easy geom recipes: practice with `compute_group` -- ### 3 fully worked examples creating geom_*() ### 3 prompts to create new geom_*() -- ## In total 6 practices/examples of simple geom_*() function builds --- # Pedegogical considerations incorporated... (Skip at COWY) ## Using 'step 0' to establish baseline knowledge; and have it fresh ## multiple examples using same strategy to let patterns sink in (practice and patterns as vehicle) ## Enumeration and high level description to reinforce pattern; keep track of which parts are clear and/or need clarification ## ... --- class: inverse, center, middle # *Easy geom recipes* tutorial assessment --- # Straight forward <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/q16tutorial_time_taken-1.png" width="65%" /> --- # High completion rate <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/q14which_completed-1.png" width="65%" /> --- # Appropriate length <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/q17tutorial_length-1.png" width="65%" /> --- # Accessible examples <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/q18example_clarity-1.png" width="65%" /> --- # Engaging <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/q19examples_engaging-1.png" width="65%" /> --- # Emotionally positive experience <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/unnamed-chunk-4-1.png" width="65%" /> --- # 7/9 could see themselves extending <img src="https://github.com/EvaMaeRey/easy-geom-recipes/raw/main/survey_results_summary_files/figure-html/unnamed-chunk-5-1.png" width="65%" /> --- # Feedback from the focus group... -- # '[in] it was that easy. And I felt empowered as a result of that....' -- # 'But you know, like, my problem isn't gonna be that easy.' --- ## Sometimes it might that easy -- ## Sometimes it won't... --- class: inverse, center, middle ## Part 6: More support for future challenges --- ### ggplot2 extenders virtual micro meeting. Support group. https://github.com/teunbrand/ggplot-extension-club --- ## Another problem: Inexperience with package development -- ### Once you get your extension working, -- ### you may want to share with the world. -- ## And then you'll really be in 'scary developer territory' (Choe). --- > ## And so I have, like this [...] code that I would like to wrap up into a package. > ## But there’s this extra layer ... like my functions work [by themselves], but when I actually put them in the package they always break. > ## So like there’s some other layer of getting into the ggplot extension world that I’m missing. --- ## Best practices for packaging ggplot2 extensions: ggtedious ## https://github.com/EvaMaeRey/ggtedious (not yet scheduled) -- > ## Testing your code can be painful and tedious, but it greatly increases the quality of your code. (testthat package) --- ## Experiments in literate programming in packaging development. Rmd as home of package narrative while developing ### - https://github.com/EvaMaeRey/ggsmoothfit ### - https://github.com/EvaMaeRey/ggcirclepack ### - https://github.com/EvaMaeRey/ggbarlabs --- class: center, middle, inverse # Thank you! Questions?