Problem: positioning with SF feels spooky and data prep can feel effortful

W/ ggplot2, we are able to say ‘where’ with x and y. These are positional aesthetics. With SF the ‘where’ is implied with the geometry column. This feels non-declarative and spooky to me.

Also, it often requires pre-processing to work with data you’d like to represent.

When we ask the question ‘where’ in a geographical context, a human being will respond with the name of a region; not coordinates and not a set of coordinates that define a region. Is it possible to create an interface where you define ‘where’ with geographical names or other ids? For example: aes(country = my_countries_variable) could be used with a geom/stat like geom_polygon_countries().

See step-by-step:

https://evamaerey.github.io/ggplot2_grammar_guide/geom_sf.html#1

library(tidyverse)
 
nato_names <- c("Albania", "Belgium", "Bulgaria", "Canada", "Croatia", "Czech Republic", "Denmark", "Estonia", "France", "Germany", "Greece", "Hungary",  
                "Iceland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Montenegro", "Netherlands", "Norway", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia", "Spain", "Turkey", "United Kingdom", "United States")  
library(gapminder)  
gapminder %>%  
  filter(year == 2002) %>%  
  rename(name = country) ->  
gapminder_2002_prepped  

rnaturalearth::ne_countries(scale = "medium", returnclass = "sf") %>%  
  select(name, pop_est, gdp_md_est,  
         continent, geometry) %>%  
  filter(name %in% nato_names) %>%  
  left_join(gapminder_2002_prepped, by = "name") ->  
  nato_countries  

ggplot(data = nato_countries) +  
  # This part feels spooky to me 
  ggplot2::geom_sf(data = nato_countries) +  
    aes(fill = lifeExp) +  
    scale_fill_viridis_c(option = "magma", direction = -1) +  
  ggplot2::coord_sf(xlim = c(-175, 47.5),  
           ylim = c(23, 85),  
           expand = FALSE) +  
  labs(title = "Life Expectancy, 2002",  
         subtitle = "NATO Member States",  
         fill = "Share of GDP (%)") +  
  theme_bw() +  
  theme(plot.title =  
            element_text(hjust = 0.5)) +  
  theme(plot.subtitle =
            element_text(hjust = 0.5))

A different API?

Here is what I think might be an easy to use API.

ggworld(data = gapminder_2002, projection = 'a good one') + 
  aes(country = country, # positioning aesthetic
      fill = lifeExp) + 
  geom_polygon_country()

Note: country names are really hard. Other codes could be used. One could try to use the wonderful countrycodes as well when

Proof of concept? (without SF, and with easy statenames case)

Because of my limited knowledge of SF, let’s start w/ the x and y inputs approach to mapping with ggplot2 and see if we can cook something up.

Step 00: See how someone gets it done

I was here: https://remiller1450.github.io/s230s19/Intro_maps.html

Step 0: get it done w/ ggplot2

Look at mapping w/ ggplot2, no sf, we use geom_polygon and the x, y and group aesthetics). It is not very efficient (lots of redundant info in the ), but we won’t worry about that for this demonstration. A little rewrite to get inspired.

library(ggplot2) 
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(maps)
MainStates <- map_data("state")
StatePopulation <- read.csv("https://raw.githubusercontent.com/ds4stats/r-tutorials/master/intro-maps/data/StatePopulation.csv", as.is = TRUE)

MergedStates <- inner_join(MainStates, StatePopulation, by = "region")

StatePopulation %>% # becomes ggus()
  right_join(MainStates) %>% # becomes ggus()
  ggplot() + # becomes ggus()
  geom_polygon() + # becomes geom_polygon state
  aes(x = long, # becomes state = region
      y = lat, # becomes state = region
      group = group, # becomes state = region
      fill = population/1000000) + 
  coord_map(projection = "orthographic", 
            orientation = c(41, -74, 0))
## Joining with `by = join_by(region)`
## Warning in right_join(., MainStates): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 1 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.

Step 1: compute function and test

library(maps)
MainStates <- map_data("state")
# AllCounty <- map_data("county")
# MainCities <- filter(us.cities, long>=-130)

MainStates %>% 
  rename(state = region) %>% 
  rename(which_polygon = group) ->
continuous_states

continuous_states %>% 
  count(state, which_polygon) %>% 
  sample_n(10)
##           state which_polygon    n
## 1  rhode island            46   66
## 2      new york            35  290
## 3    new mexico            33   78
## 4          ohio            42  238
## 5         texas            50 1088
## 6          utah            51   59
## 7     wisconsin            62  388
## 8      delaware             7   94
## 9    new jersey            32  205
## 10     oklahoma            43  284
StatePopulation <- read.csv("https://raw.githubusercontent.com/ds4stats/r-tutorials/master/intro-maps/data/StatePopulation.csv", as.is = TRUE)

compute_state <- function(data, scales){
 
  data %>% 
    inner_join(continuous_states, multiple = "all") %>% 
    mutate(x = long, y = lat) %>% 
    arrange(order) %>%
    # mutate(group = paste(group, ".", which_polygon)) %>%
    # group_by(which_polygon) %>%
    data.frame()

}

StatePopulation %>% 
  rename(state = region) %>% 
  mutate(group = row_number()) %>% 
  compute_state() ->
computed
## Joining with `by = join_by(state)`
computed %>% sample_n(30)
##             state population elect_votes group       long      lat
## 1        new york   19746227          29    32  -76.19193 44.02608
## 2      california   38802500          55     5 -120.40134 34.46341
## 3           texas   26956958          38    43 -100.13583 28.14942
## 4        illinois   12880580          20    13  -88.45322 37.13340
## 5           texas   26956958          38    43  -97.07624 28.20671
## 6         florida   19893297          29     9  -86.37339 30.43552
## 7           texas   26956958          38    43  -96.41161 28.74529
## 8        new york   19746227          29    32  -76.97688 43.25258
## 9   west virginia    1850326           5    48  -80.62089 40.50239
## 10      minnesota    5457173          10    23  -92.68738 48.47796
## 11     new jersey    8938175          14    30  -74.11782 39.94662
## 12      tennessee    6549352          11    42  -83.42838 35.62078
## 13  west virginia    1850326           5    48  -80.55787 40.63990
## 14     new jersey    8938175          14    30  -75.39552 39.36220
## 15   south dakota     853175           3    41  -97.59190 42.84005
## 16           ohio   11594163          18    35  -81.86994 38.90384
## 17          texas   26956958          38    43  -97.59190 27.24987
## 18       illinois   12880580          20    13  -89.38715 37.08183
## 19        florida   19893297          29     9  -82.80959 29.16928
## 20       illinois   12880580          20    13  -89.42152 37.37977
## 21      wisconsin    5757564          10    49  -87.50784 44.19223
## 22        georgia   10097343          16    10  -83.27369 34.61811
## 23       maryland    5976407          10    20  -77.06283 38.99551
## 24   pennsylvania   12787209          20    38  -75.73356 39.78046
## 25 north carolina    9943964          15    33  -76.41537 35.97602
## 26   south dakota     853175           3    41  -96.83559 45.59025
## 27        wyoming     584153           3    50 -104.06059 41.39620
## 28          idaho    1634464           4    12 -111.43456 44.72509
## 29          texas   26956958          38    43  -97.76379 27.41603
## 30  west virginia    1850326           5    48  -77.75610 39.27626
##    which_polygon order subregion          x        y
## 1             35  9173      main  -76.19193 44.02608
## 2              4   839      <NA> -120.40134 34.46341
## 3             50 12822      <NA> -100.13583 28.14942
## 4             12  3075      <NA>  -88.45322 37.13340
## 5             50 12546      <NA>  -97.07624 28.20671
## 6              9  2195      <NA>  -86.37339 30.43552
## 7             50 12463      <NA>  -96.41161 28.74529
## 8             35  9132      main  -76.97688 43.25258
## 9             61 15125      <NA>  -80.62089 40.50239
## 10            25  7209      <NA>  -92.68738 48.47796
## 11            32  8823      <NA>  -74.11782 39.94662
## 12            49 12074      <NA>  -83.42838 35.62078
## 13            61 15130      <NA>  -80.55787 40.63990
## 14            32  8887      <NA>  -75.39552 39.36220
## 15            48 11786      <NA>  -97.59190 42.84005
## 16            42 10502      <NA>  -81.86994 38.90384
## 17            50 12636      <NA>  -97.59190 27.24987
## 18            12  3105      <NA>  -89.38715 37.08183
## 19             9  1980      <NA>  -82.80959 29.16928
## 20            12  3116      <NA>  -89.42152 37.37977
## 21            62 15486      <NA>  -87.50784 44.19223
## 22            10  2615      <NA>  -83.27369 34.61811
## 23            19  5900      <NA>  -77.06283 38.99551
## 24            45 11320      <NA>  -75.73356 39.78046
## 25            39  9911      main  -76.41537 35.97602
## 26            48 11853      <NA>  -96.83559 45.59025
## 27            63 15588      <NA> -104.06059 41.39620
## 28            11  2922      <NA> -111.43456 44.72509
## 29            50 12623      <NA>  -97.76379 27.41603
## 30            61 14854      <NA>  -77.75610 39.27626
computed %>% 
  ggplot() + 
  aes(x = x, y = y) + 
  geom_point() + 
  geom_polygon(color = "white") +
  aes(group = which_polygon)

my_setup_data <- function(data, params){
                                    if(data$group[1] == -1){
                                      nrows <- nrow(data)
                                      data$group <- seq_len(nrows)
                                    }
                                    data
}

Step 2: pass to ggproto

StatPolygonstate <- ggplot2::ggproto(`_class` = "StatPolygonstate",
                                  `_inherit` = ggplot2::Stat,
                                  required_aes = c("state"),
                                  # setup_data = my_setup_data,
                                  compute_panel = compute_state,
                                  default_aes = aes(group = after_stat(which_polygon))
                                  )

Step 3: write geom_* function

geom_polygon_state <- function(
  mapping = NULL,
  data = NULL,
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE, ...) {
  ggplot2::layer(
    stat = StatPolygonstate,  # proto object from step 2
    geom = ggplot2::GeomPolygon,  # inherit other behavior
    data = data,
    mapping = mapping,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

step 4: test geom_* function, celebrate!

StatePopulation %>% #count(region)
  mutate(region = region %>% stringr::str_trim() %>% as.factor()) %>% 
  ggplot(data = .) + 
  aes(state = region, # state indicates position instead of x and y 
      fill = population) +
  geom_polygon_state() + 
  coord_map(projection = "orthographic", 
            orientation = c(41, -74, 0)) + 
  geom_polygon_state(data = . %>% 
                       filter(region == "new york"),
                     color = "red" ) + 
  geom_polygon_state(data = . %>% 
                       filter(region == "colorado"),
                     color = "green" ) + 
  scale_fill_viridis_c(option = "magma", end = .9) + 
  ggstamp::stamp_point(y = 30.2, x = -97.7, 
                       color = "goldenrod1",
                       size = 2)
## Joining with `by = join_by(state)`
## Joining with `by = join_by(state)`
## Joining with `by = join_by(state)`

Step 5: What’s not working and taking a break.

  1. Solved! “Not sure what’s going on w/ merge so that we only get the 4 states,” now much better! there was a grouping variable in the polygons data set named ‘group’ - this really messed with things because ggplot2 uses that var internally. We renamed!
  2. state = var, var must be factor (not character). I’ve seen this before and I’d like that not to be the case, but fine for now.
  3. There are subregions. You can see the tell tail-signs. Seems like setup_data should help w/ group management
  4. Take a break!

0 County - ggfips?

AllCounty <- map_data("county")

ggplot() + geom_polygon(data=AllCounty, aes(x=long, y=lat, group=group),
                color="darkblue", fill="lightblue", size = .1 ) +
  
          geom_polygon( data=MainStates, aes(x=long, y=lat, group=group),
                color="black", fill="lightblue",  size = .15, alpha = .3)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

Circle pack seems similar…

https://evamaerey.github.io/mytidytuesday/2022-05-28-circle-pack/circle_pack_stipple.html