class: center, middle, inverse, title-slide # long to wide or wide to long ### Evangeline Reynolds ### 2/16/2019 --- ```r library(gapminder) library(tidyverse) ``` ``` ## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ── ``` ``` ## ✓ ggplot2 3.3.0.9000 ✓ purrr 0.3.3 ## ✓ tibble 2.1.3 ✓ dplyr 0.8.4 ## ✓ tidyr 1.0.2 ✓ stringr 1.4.0 ## ✓ readr 1.3.1 ✓ forcats 0.4.0 ``` ``` ## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ── ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ``` ```r library(flipbookr) knitr::opts_chunk$set(cache = F, comment = "") ``` --- name: longtowideandback ## Exercise: *spread data, gather data* Sometimes, we need to transform the shape of our data. The wide form of data is an efficient way to store data - especially time-series data. However, we often need data in a long for for data analysis, so shape transformations are important. --- ## Exercise: *long to wide* --- class: split-40 count: false .column[.content[ ```r *gapminder ``` ]] .column[.content[ ``` # A tibble: 1,704 x 6 country continent year lifeExp pop gdpPercap <fct> <fct> <int> <dbl> <int> <dbl> 1 Afghanistan Asia 1952 28.8 8425333 779. 2 Afghanistan Asia 1957 30.3 9240934 821. 3 Afghanistan Asia 1962 32.0 10267083 853. 4 Afghanistan Asia 1967 34.0 11537966 836. 5 Afghanistan Asia 1972 36.1 13079460 740. 6 Afghanistan Asia 1977 38.4 14880372 786. 7 Afghanistan Asia 1982 39.9 12881816 978. 8 Afghanistan Asia 1987 40.8 13867957 852. 9 Afghanistan Asia 1992 41.7 16317921 649. 10 Afghanistan Asia 1997 41.8 22227415 635. # … with 1,694 more rows ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder %>% * select(country, continent, * lifeExp, year) ``` ]] .column[.content[ ``` # A tibble: 1,704 x 4 country continent lifeExp year <fct> <fct> <dbl> <int> 1 Afghanistan Asia 28.8 1952 2 Afghanistan Asia 30.3 1957 3 Afghanistan Asia 32.0 1962 4 Afghanistan Asia 34.0 1967 5 Afghanistan Asia 36.1 1972 6 Afghanistan Asia 38.4 1977 7 Afghanistan Asia 39.9 1982 8 Afghanistan Asia 40.8 1987 9 Afghanistan Asia 41.7 1992 10 Afghanistan Asia 41.8 1997 # … with 1,694 more rows ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder %>% select(country, continent, lifeExp, year) %>% * spread(key = year, * value = lifeExp) ``` ]] .column[.content[ ``` # A tibble: 142 x 14 country continent `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Afghan… Asia 28.8 30.3 32.0 34.0 36.1 38.4 39.9 40.8 2 Albania Europe 55.2 59.3 64.8 66.2 67.7 68.9 70.4 72 3 Algeria Africa 43.1 45.7 48.3 51.4 54.5 58.0 61.4 65.8 4 Angola Africa 30.0 32.0 34 36.0 37.9 39.5 39.9 39.9 5 Argent… Americas 62.5 64.4 65.1 65.6 67.1 68.5 69.9 70.8 6 Austra… Oceania 69.1 70.3 70.9 71.1 71.9 73.5 74.7 76.3 7 Austria Europe 66.8 67.5 69.5 70.1 70.6 72.2 73.2 74.9 8 Bahrain Asia 50.9 53.8 56.9 59.9 63.3 65.6 69.1 70.8 9 Bangla… Asia 37.5 39.3 41.2 43.5 45.3 46.9 50.0 52.8 10 Belgium Europe 68 69.2 70.2 70.9 71.4 72.8 73.9 75.4 # … with 132 more rows, and 4 more variables: `1992` <dbl>, `1997` <dbl>, # `2002` <dbl>, `2007` <dbl> ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder %>% select(country, continent, lifeExp, year) %>% spread(key = year, value = lifeExp) -> *gapminder_life_exp_wide ``` ]] .column[.content[ ]] --- ## Exercise: *wide to long* --- class: split-40 count: false .column[.content[ ```r *gapminder_life_exp_wide ``` ]] .column[.content[ ``` # A tibble: 142 x 14 country continent `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Afghan… Asia 28.8 30.3 32.0 34.0 36.1 38.4 39.9 40.8 2 Albania Europe 55.2 59.3 64.8 66.2 67.7 68.9 70.4 72 3 Algeria Africa 43.1 45.7 48.3 51.4 54.5 58.0 61.4 65.8 4 Angola Africa 30.0 32.0 34 36.0 37.9 39.5 39.9 39.9 5 Argent… Americas 62.5 64.4 65.1 65.6 67.1 68.5 69.9 70.8 6 Austra… Oceania 69.1 70.3 70.9 71.1 71.9 73.5 74.7 76.3 7 Austria Europe 66.8 67.5 69.5 70.1 70.6 72.2 73.2 74.9 8 Bahrain Asia 50.9 53.8 56.9 59.9 63.3 65.6 69.1 70.8 9 Bangla… Asia 37.5 39.3 41.2 43.5 45.3 46.9 50.0 52.8 10 Belgium Europe 68 69.2 70.2 70.9 71.4 72.8 73.9 75.4 # … with 132 more rows, and 4 more variables: `1992` <dbl>, `1997` <dbl>, # `2002` <dbl>, `2007` <dbl> ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder_life_exp_wide %>% * gather(key = "year", * value = "lifeExp", * -country, -continent) ``` ]] .column[.content[ ``` # A tibble: 1,704 x 4 country continent year lifeExp <fct> <fct> <chr> <dbl> 1 Afghanistan Asia 1952 28.8 2 Albania Europe 1952 55.2 3 Algeria Africa 1952 43.1 4 Angola Africa 1952 30.0 5 Argentina Americas 1952 62.5 6 Australia Oceania 1952 69.1 7 Austria Europe 1952 66.8 8 Bahrain Asia 1952 50.9 9 Bangladesh Asia 1952 37.5 10 Belgium Europe 1952 68 # … with 1,694 more rows ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder_life_exp_wide %>% gather(key = "year", value = "lifeExp", -country, -continent) %>% * mutate(year = as.numeric(year)) ``` ]] .column[.content[ ``` # A tibble: 1,704 x 4 country continent year lifeExp <fct> <fct> <dbl> <dbl> 1 Afghanistan Asia 1952 28.8 2 Albania Europe 1952 55.2 3 Algeria Africa 1952 43.1 4 Angola Africa 1952 30.0 5 Argentina Americas 1952 62.5 6 Australia Oceania 1952 69.1 7 Austria Europe 1952 66.8 8 Bahrain Asia 1952 50.9 9 Bangladesh Asia 1952 37.5 10 Belgium Europe 1952 68 # … with 1,694 more rows ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder_life_exp_wide %>% gather(key = "year", value = "lifeExp", -country, -continent) %>% mutate(year = as.numeric(year)) -> * gapminder_life_exp_long ``` ]] .column[.content[ ]] ---