class: left, top, inverse background-image: url(https://images.unsplash.com/photo-1530018607912-eff2daa1bac4?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1502&q=80) background-size: cover # .large[data.table] ### Walk-throughs <br>with {flipbookr}<br>and {xaringan} <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> #### .right[Gina Reynolds<br>Photo Credit: Abel Y Costa] --- # Set up Okay. Let's load the the `reveal for xaringan` functions for "flipbooking" and the `data.table`. ```r source(file = "https://raw.githubusercontent.com/EvaMaeRey/little_flipbooks_library/master/xaringan_reveal_parentheses_balanced.R") ``` # Where we are headed ![](data.table_files/figure-html/pipe_data.table-1.png)<!-- --> --- # The pipeline --- class: split-40 count: false .column[.content[ ```r *library(data.table) ``` ]] .column[.content[ ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) *library(gapminder) ``` ]] .column[.content[ ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) *library(magrittr) ``` ]] .column[.content[ ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) library(magrittr) *library(ggplot2) ``` ]] .column[.content[ ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) library(magrittr) library(ggplot2) *gapminder ``` ]] .column[.content[ ``` # A tibble: 1,704 x 6 country continent year lifeExp pop gdpPercap <fct> <fct> <int> <dbl> <int> <dbl> 1 Afghanistan Asia 1952 28.8 8425333 779. 2 Afghanistan Asia 1957 30.3 9240934 821. 3 Afghanistan Asia 1962 32.0 10267083 853. 4 Afghanistan Asia 1967 34.0 11537966 836. 5 Afghanistan Asia 1972 36.1 13079460 740. 6 Afghanistan Asia 1977 38.4 14880372 786. 7 Afghanistan Asia 1982 39.9 12881816 978. 8 Afghanistan Asia 1987 40.8 13867957 852. 9 Afghanistan Asia 1992 41.7 16317921 649. 10 Afghanistan Asia 1997 41.8 22227415 635. # … with 1,694 more rows ``` ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) library(magrittr) library(ggplot2) gapminder %>% * data.table() ``` ]] .column[.content[ ``` country continent year lifeExp pop gdpPercap 1: Afghanistan Asia 1952 28.801 8425333 779.4453 2: Afghanistan Asia 1957 30.332 9240934 820.8530 3: Afghanistan Asia 1962 31.997 10267083 853.1007 4: Afghanistan Asia 1967 34.020 11537966 836.1971 5: Afghanistan Asia 1972 36.088 13079460 739.9811 --- 1700: Zimbabwe Africa 1987 62.351 9216418 706.1573 1701: Zimbabwe Africa 1992 60.377 10704340 693.4208 1702: Zimbabwe Africa 1997 46.809 11404948 792.4500 1703: Zimbabwe Africa 2002 39.989 11926563 672.0386 1704: Zimbabwe Africa 2007 43.487 12311143 469.7093 ``` ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) library(magrittr) library(ggplot2) gapminder %>% data.table() %>% * .[year > 1980] ``` ]] .column[.content[ ``` country continent year lifeExp pop gdpPercap 1: Afghanistan Asia 1982 39.854 12881816 978.0114 2: Afghanistan Asia 1987 40.822 13867957 852.3959 3: Afghanistan Asia 1992 41.674 16317921 649.3414 4: Afghanistan Asia 1997 41.763 22227415 635.3414 5: Afghanistan Asia 2002 42.129 25268405 726.7341 --- 848: Zimbabwe Africa 1987 62.351 9216418 706.1573 849: Zimbabwe Africa 1992 60.377 10704340 693.4208 850: Zimbabwe Africa 1997 46.809 11404948 792.4500 851: Zimbabwe Africa 2002 39.989 11926563 672.0386 852: Zimbabwe Africa 2007 43.487 12311143 469.7093 ``` ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) library(magrittr) library(ggplot2) gapminder %>% data.table() %>% .[year > 1980] %>% * .[ , * mean(gdpPercap) , * by = .(continent, year) ] ``` ]] .column[.content[ ``` continent year V1 1: Asia 1982 7434.135 2: Asia 1987 7608.227 3: Asia 1992 8639.690 4: Asia 1997 9834.093 5: Asia 2002 10174.090 6: Asia 2007 12473.027 7: Europe 1982 15617.897 8: Europe 1987 17214.311 9: Europe 1992 17061.568 10: Europe 1997 19076.782 11: Europe 2002 21711.732 12: Europe 2007 25054.482 13: Africa 1982 2481.593 14: Africa 1987 2282.669 15: Africa 1992 2281.810 16: Africa 1997 2378.760 17: Africa 2002 2599.385 18: Africa 2007 3089.033 19: Americas 1982 7506.737 20: Americas 1987 7793.400 21: Americas 1992 8044.934 22: Americas 1997 8889.301 23: Americas 2002 9287.677 24: Americas 2007 11003.032 25: Oceania 1982 18554.710 26: Oceania 1987 20448.040 27: Oceania 1992 20894.046 28: Oceania 1997 24024.175 29: Oceania 2002 26938.778 30: Oceania 2007 29810.188 continent year V1 ``` ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) library(magrittr) library(ggplot2) gapminder %>% data.table() %>% .[year > 1980] %>% .[ , mean(gdpPercap) , by = .(continent, year) ] %>% * ggplot() ``` ]] .column[.content[ ![](data.table_files/figure-html/output_pipe_data.table_11-1.png)<!-- --> ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) library(magrittr) library(ggplot2) gapminder %>% data.table() %>% .[year > 1980] %>% .[ , mean(gdpPercap) , by = .(continent, year) ] %>% ggplot() + * aes(x = year) ``` ]] .column[.content[ ![](data.table_files/figure-html/output_pipe_data.table_12-1.png)<!-- --> ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) library(magrittr) library(ggplot2) gapminder %>% data.table() %>% .[year > 1980] %>% .[ , mean(gdpPercap) , by = .(continent, year) ] %>% ggplot() + aes(x = year) + * aes(y = V1) ``` ]] .column[.content[ ![](data.table_files/figure-html/output_pipe_data.table_13-1.png)<!-- --> ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) library(magrittr) library(ggplot2) gapminder %>% data.table() %>% .[year > 1980] %>% .[ , mean(gdpPercap) , by = .(continent, year) ] %>% ggplot() + aes(x = year) + aes(y = V1) + * labs(y = "Average GDP per cap") ``` ]] .column[.content[ ![](data.table_files/figure-html/output_pipe_data.table_14-1.png)<!-- --> ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) library(magrittr) library(ggplot2) gapminder %>% data.table() %>% .[year > 1980] %>% .[ , mean(gdpPercap) , by = .(continent, year) ] %>% ggplot() + aes(x = year) + aes(y = V1) + labs(y = "Average GDP per cap") + * geom_point() ``` ]] .column[.content[ ![](data.table_files/figure-html/output_pipe_data.table_15-1.png)<!-- --> ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) library(magrittr) library(ggplot2) gapminder %>% data.table() %>% .[year > 1980] %>% .[ , mean(gdpPercap) , by = .(continent, year) ] %>% ggplot() + aes(x = year) + aes(y = V1) + labs(y = "Average GDP per cap") + geom_point() + * geom_line() ``` ]] .column[.content[ ![](data.table_files/figure-html/output_pipe_data.table_16-1.png)<!-- --> ]] --- class: split-40 count: false .column[.content[ ```r library(data.table) library(gapminder) library(magrittr) library(ggplot2) gapminder %>% data.table() %>% .[year > 1980] %>% .[ , mean(gdpPercap) , by = .(continent, year) ] %>% ggplot() + aes(x = year) + aes(y = V1) + labs(y = "Average GDP per cap") + geom_point() + geom_line() + * aes(color = continent) ``` ]] .column[.content[ ![](data.table_files/figure-html/output_pipe_data.table_17-1.png)<!-- --> ]] --- class: split-40 count: false .column[.content[ ```r *gapminder ``` ]] .column[.content[ ``` # A tibble: 1,704 x 6 country continent year lifeExp pop gdpPercap <fct> <fct> <int> <dbl> <int> <dbl> 1 Afghanistan Asia 1952 28.8 8425333 779. 2 Afghanistan Asia 1957 30.3 9240934 821. 3 Afghanistan Asia 1962 32.0 10267083 853. 4 Afghanistan Asia 1967 34.0 11537966 836. 5 Afghanistan Asia 1972 36.1 13079460 740. 6 Afghanistan Asia 1977 38.4 14880372 786. 7 Afghanistan Asia 1982 39.9 12881816 978. 8 Afghanistan Asia 1987 40.8 13867957 852. 9 Afghanistan Asia 1992 41.7 16317921 649. 10 Afghanistan Asia 1997 41.8 22227415 635. # … with 1,694 more rows ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder %>% * data.table() ``` ]] .column[.content[ ``` country continent year lifeExp pop gdpPercap 1: Afghanistan Asia 1952 28.801 8425333 779.4453 2: Afghanistan Asia 1957 30.332 9240934 820.8530 3: Afghanistan Asia 1962 31.997 10267083 853.1007 4: Afghanistan Asia 1967 34.020 11537966 836.1971 5: Afghanistan Asia 1972 36.088 13079460 739.9811 --- 1700: Zimbabwe Africa 1987 62.351 9216418 706.1573 1701: Zimbabwe Africa 1992 60.377 10704340 693.4208 1702: Zimbabwe Africa 1997 46.809 11404948 792.4500 1703: Zimbabwe Africa 2002 39.989 11926563 672.0386 1704: Zimbabwe Africa 2007 43.487 12311143 469.7093 ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder %>% data.table() %>% * .[continent == "Asia", ] ``` ]] .column[.content[ ``` country continent year lifeExp pop gdpPercap 1: Afghanistan Asia 1952 28.801 8425333 779.4453 2: Afghanistan Asia 1957 30.332 9240934 820.8530 3: Afghanistan Asia 1962 31.997 10267083 853.1007 4: Afghanistan Asia 1967 34.020 11537966 836.1971 5: Afghanistan Asia 1972 36.088 13079460 739.9811 --- 392: Yemen, Rep. Asia 1987 52.922 11219340 1971.7415 393: Yemen, Rep. Asia 1992 55.599 13367997 1879.4967 394: Yemen, Rep. Asia 1997 58.020 15826497 2117.4845 395: Yemen, Rep. Asia 2002 60.308 18701257 2234.8208 396: Yemen, Rep. Asia 2007 62.698 22211743 2280.7699 ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder %>% data.table() %>% .[continent == "Asia", ] %>% * .[1:30,] ``` ]] .column[.content[ ``` country continent year lifeExp pop gdpPercap 1: Afghanistan Asia 1952 28.801 8425333 779.4453 2: Afghanistan Asia 1957 30.332 9240934 820.8530 3: Afghanistan Asia 1962 31.997 10267083 853.1007 4: Afghanistan Asia 1967 34.020 11537966 836.1971 5: Afghanistan Asia 1972 36.088 13079460 739.9811 6: Afghanistan Asia 1977 38.438 14880372 786.1134 7: Afghanistan Asia 1982 39.854 12881816 978.0114 8: Afghanistan Asia 1987 40.822 13867957 852.3959 9: Afghanistan Asia 1992 41.674 16317921 649.3414 10: Afghanistan Asia 1997 41.763 22227415 635.3414 11: Afghanistan Asia 2002 42.129 25268405 726.7341 12: Afghanistan Asia 2007 43.828 31889923 974.5803 13: Bahrain Asia 1952 50.939 120447 9867.0848 14: Bahrain Asia 1957 53.832 138655 11635.7995 15: Bahrain Asia 1962 56.923 171863 12753.2751 16: Bahrain Asia 1967 59.923 202182 14804.6727 17: Bahrain Asia 1972 63.300 230800 18268.6584 18: Bahrain Asia 1977 65.593 297410 19340.1020 19: Bahrain Asia 1982 69.052 377967 19211.1473 20: Bahrain Asia 1987 70.750 454612 18524.0241 21: Bahrain Asia 1992 72.601 529491 19035.5792 22: Bahrain Asia 1997 73.925 598561 20292.0168 23: Bahrain Asia 2002 74.795 656397 23403.5593 24: Bahrain Asia 2007 75.635 708573 29796.0483 25: Bangladesh Asia 1952 37.484 46886859 684.2442 26: Bangladesh Asia 1957 39.348 51365468 661.6375 27: Bangladesh Asia 1962 41.216 56839289 686.3416 28: Bangladesh Asia 1967 43.453 62821884 721.1861 29: Bangladesh Asia 1972 45.252 70759295 630.2336 30: Bangladesh Asia 1977 46.923 80428306 659.8772 country continent year lifeExp pop gdpPercap ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder %>% data.table() %>% .[continent == "Asia", ] %>% .[1:30,] %>% * .[, c(1,3:6)] ``` ]] .column[.content[ ``` country year lifeExp pop gdpPercap 1: Afghanistan 1952 28.801 8425333 779.4453 2: Afghanistan 1957 30.332 9240934 820.8530 3: Afghanistan 1962 31.997 10267083 853.1007 4: Afghanistan 1967 34.020 11537966 836.1971 5: Afghanistan 1972 36.088 13079460 739.9811 6: Afghanistan 1977 38.438 14880372 786.1134 7: Afghanistan 1982 39.854 12881816 978.0114 8: Afghanistan 1987 40.822 13867957 852.3959 9: Afghanistan 1992 41.674 16317921 649.3414 10: Afghanistan 1997 41.763 22227415 635.3414 11: Afghanistan 2002 42.129 25268405 726.7341 12: Afghanistan 2007 43.828 31889923 974.5803 13: Bahrain 1952 50.939 120447 9867.0848 14: Bahrain 1957 53.832 138655 11635.7995 15: Bahrain 1962 56.923 171863 12753.2751 16: Bahrain 1967 59.923 202182 14804.6727 17: Bahrain 1972 63.300 230800 18268.6584 18: Bahrain 1977 65.593 297410 19340.1020 19: Bahrain 1982 69.052 377967 19211.1473 20: Bahrain 1987 70.750 454612 18524.0241 21: Bahrain 1992 72.601 529491 19035.5792 22: Bahrain 1997 73.925 598561 20292.0168 23: Bahrain 2002 74.795 656397 23403.5593 24: Bahrain 2007 75.635 708573 29796.0483 25: Bangladesh 1952 37.484 46886859 684.2442 26: Bangladesh 1957 39.348 51365468 661.6375 27: Bangladesh 1962 41.216 56839289 686.3416 28: Bangladesh 1967 43.453 62821884 721.1861 29: Bangladesh 1972 45.252 70759295 630.2336 30: Bangladesh 1977 46.923 80428306 659.8772 country year lifeExp pop gdpPercap ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder %>% data.table() %>% .[continent == "Asia", ] %>% .[1:30,] %>% .[, c(1,3:6)] %>% * .[, .(pop, year, gdpPercap)] ``` ]] .column[.content[ ``` pop year gdpPercap 1: 8425333 1952 779.4453 2: 9240934 1957 820.8530 3: 10267083 1962 853.1007 4: 11537966 1967 836.1971 5: 13079460 1972 739.9811 6: 14880372 1977 786.1134 7: 12881816 1982 978.0114 8: 13867957 1987 852.3959 9: 16317921 1992 649.3414 10: 22227415 1997 635.3414 11: 25268405 2002 726.7341 12: 31889923 2007 974.5803 13: 120447 1952 9867.0848 14: 138655 1957 11635.7995 15: 171863 1962 12753.2751 16: 202182 1967 14804.6727 17: 230800 1972 18268.6584 18: 297410 1977 19340.1020 19: 377967 1982 19211.1473 20: 454612 1987 18524.0241 21: 529491 1992 19035.5792 22: 598561 1997 20292.0168 23: 656397 2002 23403.5593 24: 708573 2007 29796.0483 25: 46886859 1952 684.2442 26: 51365468 1957 661.6375 27: 56839289 1962 686.3416 28: 62821884 1967 721.1861 29: 70759295 1972 630.2336 30: 80428306 1977 659.8772 pop year gdpPercap ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder %>% data.table() %>% .[continent == "Asia", ] %>% .[1:30,] %>% .[, c(1,3:6)] %>% .[, .(pop, year, gdpPercap)] %>% * # .[, .(world_pop = sum(pop)), by = year] %>% # .[, .(world_pop = sum(pop)), by = year] %>% * .[, gdp := gdpPercap * pop] %>% * .[year %in% 1952:1967, my_var := 2] %>% * .[] ``` ]] .column[.content[ ``` pop year gdpPercap gdp my_var 1: 8425333 1952 779.4453 6567086330 2 2: 9240934 1957 820.8530 7585448670 2 3: 10267083 1962 853.1007 8758855797 2 4: 11537966 1967 836.1971 9648014150 2 5: 13079460 1972 739.9811 9678553274 NA 6: 14880372 1977 786.1134 11697659231 NA 7: 12881816 1982 978.0114 12598563401 NA 8: 13867957 1987 852.3959 11820990309 NA 9: 16317921 1992 649.3414 10595901589 NA 10: 22227415 1997 635.3414 14121995875 NA 11: 25268405 2002 726.7341 18363410424 NA 12: 31889923 2007 974.5803 31079291949 NA 13: 120447 1952 9867.0848 1188460759 2 14: 138655 1957 11635.7995 1613361773 2 15: 171863 1962 12753.2751 2191816125 2 16: 202182 1967 14804.6727 2993238336 2 17: 230800 1972 18268.6584 4216406356 NA 18: 297410 1977 19340.1020 5751939724 NA 19: 377967 1982 19211.1473 7261179715 NA 20: 454612 1987 18524.0241 8421243626 NA 21: 529491 1992 19035.5792 10079167850 NA 22: 598561 1997 20292.0168 12146009862 NA 23: 656397 2002 23403.5593 15362026094 NA 24: 708573 2007 29796.0483 21112675360 NA 25: 46886859 1952 684.2442 32082059995 2 26: 51365468 1957 661.6375 33985317661 2 27: 56839289 1962 686.3416 39011165929 2 28: 62821884 1967 721.1861 45306268650 2 29: 70759295 1972 630.2336 44594887096 NA 30: 80428306 1977 659.8772 53072807954 NA pop year gdpPercap gdp my_var ``` ]] --- class: split-40 count: false .column[.content[ ```r gapminder %>% data.table() %>% .[continent == "Asia", ] %>% .[1:30,] %>% .[, c(1,3:6)] %>% .[, .(pop, year, gdpPercap)] %>% # .[, .(world_pop = sum(pop)), by = year] %>% # .[, .(world_pop = sum(pop)), by = year] %>% .[, gdp := gdpPercap * pop] %>% .[year %in% 1952:1967, my_var := 2] %>% .[] %>% * .[, year := NULL] %>% * .[] ``` ]] .column[.content[ ``` pop gdpPercap gdp my_var 1: 8425333 779.4453 6567086330 2 2: 9240934 820.8530 7585448670 2 3: 10267083 853.1007 8758855797 2 4: 11537966 836.1971 9648014150 2 5: 13079460 739.9811 9678553274 NA 6: 14880372 786.1134 11697659231 NA 7: 12881816 978.0114 12598563401 NA 8: 13867957 852.3959 11820990309 NA 9: 16317921 649.3414 10595901589 NA 10: 22227415 635.3414 14121995875 NA 11: 25268405 726.7341 18363410424 NA 12: 31889923 974.5803 31079291949 NA 13: 120447 9867.0848 1188460759 2 14: 138655 11635.7995 1613361773 2 15: 171863 12753.2751 2191816125 2 16: 202182 14804.6727 2993238336 2 17: 230800 18268.6584 4216406356 NA 18: 297410 19340.1020 5751939724 NA 19: 377967 19211.1473 7261179715 NA 20: 454612 18524.0241 8421243626 NA 21: 529491 19035.5792 10079167850 NA 22: 598561 20292.0168 12146009862 NA 23: 656397 23403.5593 15362026094 NA 24: 708573 29796.0483 21112675360 NA 25: 46886859 684.2442 32082059995 2 26: 51365468 661.6375 33985317661 2 27: 56839289 686.3416 39011165929 2 28: 62821884 721.1861 45306268650 2 29: 70759295 630.2336 44594887096 NA 30: 80428306 659.8772 53072807954 NA pop gdpPercap gdp my_var ``` ]] --- # Create a data.table ```r data.table(a = c(1, 2), b = c("a", "b")) ``` ``` a b 1: 1 a 2: 2 b ``` --- ```r gapminder %>% #REVEAL data.table() -> gapminder_dt ``` --- # Subset rows using i ```r gapminder_dt %>% .[1:25, ] %>% .[year > 1980, ] ``` ``` country continent year lifeExp pop gdpPercap 1: Afghanistan Asia 1982 39.854 12881816 978.0114 2: Afghanistan Asia 1987 40.822 13867957 852.3959 3: Afghanistan Asia 1992 41.674 16317921 649.3414 4: Afghanistan Asia 1997 41.763 22227415 635.3414 5: Afghanistan Asia 2002 42.129 25268405 726.7341 6: Afghanistan Asia 2007 43.828 31889923 974.5803 7: Albania Europe 1982 70.420 2780097 3630.8807 8: Albania Europe 1987 72.000 3075321 3738.9327 9: Albania Europe 1992 71.581 3326498 2497.4379 10: Albania Europe 1997 72.950 3428038 3193.0546 11: Albania Europe 2002 75.651 3508512 4604.2117 12: Albania Europe 2007 76.423 3600523 5937.0295 ``` --- # Manipulate columns with j ```r gapminder_dt %>% .[, c(1,3:6)] %>% #REVEAL .[, .(pop, year, gdpPercap)] ``` ``` pop year gdpPercap 1: 8425333 1952 779.4453 2: 9240934 1957 820.8530 3: 10267083 1962 853.1007 4: 11537966 1967 836.1971 5: 13079460 1972 739.9811 --- 1700: 9216418 1987 706.1573 1701: 10704340 1992 693.4208 1702: 11404948 1997 792.4500 1703: 11926563 2002 672.0386 1704: 12311143 2007 469.7093 ``` --- # Summarize ```r gapminder_dt %>% .[, .(x = sum(pop))] ``` ``` x 1: 50440465801 ``` --- # COMPUTE COLUMNS ```r knitr::opts_chunk$set(eval = F) ``` ```r gapminder_dt[, c := 1 + 2] gapminder_dt[a == 1, c := 1 + 2] gapminder_dt[, `:=`(c = 1 , d = 2)] ``` ```r data.table(a = c(1, 2, 5, 11), b = c("a", "b", "d", "x"), c = 1:4) -> dt ``` ```r dt %>% .[1:2, ] %>% .[a > 1, ] ``` ```r dt %>% .[, c(2)] %>% .[, .(b, c)] ``` <style type="text/css"> .remark-code{line-height: 1.5; font-size: 80%} </style>