“Let Data Tell the Story” - Tintle et all. Key idea box #1. Idea #1.


Hans Rosling Master Storyteller

https://www.youtube.com/embed/jbkSRLYSojo?list=PL6F8D7054D12E7C5A

Not just animation (overtime)

animation of graph set up!!


storytelling advice that resonates.

Also, see Garr Reynolds ‘presentation zen’: https://policyviz.com/podcast/episode-62-garr-reynolds/ ??? ‘Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that’s creativity.’ — Charles Mingus ‘Cognitive scientists have discovered three important features of the human information processing system that are particularly relevant for PowerPoint users: dual-channels, that is, people have separate information processing channels for visual material and verbal material; limited capacity, that is, people can pay attention to only a few pieces of information in each channel at a time; and active processing, that is, people understand the presented material when they pay attention to the relevant material, organize it into a coherent mental structure, and integrate it with their prior knowledge.’ — Rich Mayer, in an interview with Sociable Media, Inc.


ggplot2 is all about building up plots and layouts!!


Baller::Reynolds MA206 2021

Let’s install the gapminder package… library(gapminder)

😣😣😣 But don’t throw one more thing at us! 😣

ma206data

install.packages("remotes") install.github("EvaMaeRey/ma206data")


Figure 2.2

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'tibble' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ma206data)

prelim_NationalAnthemTimes %>% 
  ggplot() + 
  aes(x = time) + 
  geom_rug() + 
  geom_dotplot()
## Bin width defaults to 1/30 of the range of the data. Pick better value with `binwidth`.


Figure 2.3-6

prelim_NationalAnthemTimes %>% 
  names()
## [1] "year"  "genre" "sex"   "time"
prelim_NationalAnthemTimes %>% 
  ggplot() + 
  aes(x = time) + 
  geom_rug() + 
  geom_histogram() + 
  ggxmean::geom_x_mean() + 
  ggxmean::geom_x_mean_label() + 
  ggxmean:::geom_x1sd(lty = "dashed") + 
  facet_wrap(facets = vars(sex), ncol = 1) + 
  facet_wrap(facets = vars(genre), ncol = 1) + 
  facet_grid(rows = vars(sex), cols = vars(genre)) + 
  aes(color = sex)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing missing values (geom_segment).


‘The best solution to visual dispay of multivariate data is using small multiples’ - Eduard Tufte paraphrase

Small multiple: One method Tufte encourages to allow quick visual comparison of multiple series is the small multiple, a chart with many series shown on a single pair of axes that can often be easier to read when displayed as several separate pairs of axes placed next to each other. He suggests this is particularly helpful when the series are measured on quite different vertical (y-axis) scales, but over the same range on the horizontal x-axis (usually time).


remotes::install_github("EvaMaeRey/ma206data")


Figure 2.8

prelim_NationalAnthemTimes %>% 
  names()
## [1] "year"  "genre" "sex"   "time"
prelim_NationalAnthemTimes %>% 
  ggplot() + 
  aes(x = year) + 
  aes(y = time) + 
  geom_rug() + 
  geom_point() + 
  aes(color = sex) + 
  aes(shape = sex) + 
  facet_wrap(facets = vars(sex), ncol = 1) +
  geom_smooth() # just for fun
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 1984.9
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 20.105
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.221
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small. fewer
## data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
## 1984.9
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 20.105
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
## number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other near
## singularities as well. 1.221
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning -
## Inf


Fluid storytelling w/ ggplot2

Hadley Wickham, ggplot2 author on it’s motivation: > ### And, you know, I’d get a dataset. And, in my head I could very clearly kind of picture, I want to put this on the x-axis. Let’s put this on the y-axis, draw a line, put some points here, break it up by this variable. – > ### And then, like, getting that vision out of my head, and into reality, it’s just really, really hard. Just, like, felt harder than it should be. Like, there’s a lot of custom programming involved,


> ### where I just felt, like, to me, I just wanted to say, like, you know, this is what I’m thinking, this is how I’m picturing this plot. Like you’re the computer ‘Go and do it’.

… and I’d also been reading about the Grammar of Graphics by Leland Wilkinson, I got to meet him a couple of times and … I was, like, this book has been, like, written for me.

https://www.trifacta.com/podcast/tidy-data-with-hadley-wickham/


Promise of ggplot2?

## Getting the plot form you picture in your head …
## … into reality…

… by describing it.


# ggplot2 is called a ‘declarative’ graphing system.

# It lets you ‘speak your plot into existance’. (Thomas Lin Pederson?)

### This is fast. Because most of us, like Hadley Wickham, can picture the form of the plot we’re going after rather clearly (i.e. what the horizontal position should represent, what color should represent, what ‘marks’ (points, lines) are going to appear on the plot).

‘Getting’ our plot made becomes a matter of describing what we’re already picturing!


background-image: url(images/paste-A8314E33.png) background-size: cover


background-image: url(images/paste-D46A77DF.png) background-size: cover


background-image: url(images/paste-CAD8C893.png) background-size: cover


background-image: url(images/paste-DC46E991.png) background-size: cover