W/ ggplot2, we are able to say ‘where’ with x and y. These are positional aesthetics. With SF the ‘where’ is implied with the geometry column. This feels non-declarative and spooky to me.
Also, it often requires pre-processing to work with data you’d like to represent.
When we ask the question ‘where’ in a geographical context, a human being will respond with the name of a region; not coordinates and not a set of coordinates that define a region. Is it possible to create an interface where you define ‘where’ with geographical names or other ids? For example: aes(country = my_countries_variable)
could be used with a geom/stat like geom_polygon_countries().
See step-by-step:
https://evamaerey.github.io/ggplot2_grammar_guide/geom_sf.html#1
library(tidyverse)
nato_names <- c("Albania", "Belgium", "Bulgaria", "Canada", "Croatia", "Czech Republic", "Denmark", "Estonia", "France", "Germany", "Greece", "Hungary",
"Iceland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Montenegro", "Netherlands", "Norway", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia", "Spain", "Turkey", "United Kingdom", "United States")
library(gapminder)
gapminder %>%
filter(year == 2002) %>%
rename(name = country) ->
gapminder_2002_prepped
rnaturalearth::ne_countries(scale = "medium", returnclass = "sf") %>%
select(name, pop_est, gdp_md_est,
continent, geometry) %>%
filter(name %in% nato_names) %>%
left_join(gapminder_2002_prepped, by = "name") ->
nato_countries
ggplot(data = nato_countries) +
# This part feels spooky to me
ggplot2::geom_sf(data = nato_countries) +
aes(fill = lifeExp) +
scale_fill_viridis_c(option = "magma", direction = -1) +
ggplot2::coord_sf(xlim = c(-175, 47.5),
ylim = c(23, 85),
expand = FALSE) +
labs(title = "Life Expectancy, 2002",
subtitle = "NATO Member States",
fill = "Share of GDP (%)") +
theme_bw() +
theme(plot.title =
element_text(hjust = 0.5)) +
theme(plot.subtitle =
element_text(hjust = 0.5))
Here is what I think might be an easy to use API.
ggworld(data = gapminder_2002, projection = 'a good one') +
aes(country = country, # positioning aesthetic
fill = lifeExp) +
geom_polygon_country()
Note: country names are really hard. Other codes could be used. One could try to use the wonderful countrycodes as well when
Because of my limited knowledge of SF, let’s start w/ the x and y inputs approach to mapping with ggplot2 and see if we can cook something up.
I was here: https://remiller1450.github.io/s230s19/Intro_maps.html
Look at mapping w/ ggplot2, no sf, we use geom_polygon and the x, y and group aesthetics). It is not very efficient (lots of redundant info in the ), but we won’t worry about that for this demonstration. A little rewrite to get inspired.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(maps)
MainStates <- map_data("state")
StatePopulation <- read.csv("https://raw.githubusercontent.com/ds4stats/r-tutorials/master/intro-maps/data/StatePopulation.csv", as.is = TRUE)
MergedStates <- inner_join(MainStates, StatePopulation, by = "region")
StatePopulation %>% # becomes ggus()
right_join(MainStates) %>% # becomes ggus()
ggplot() + # becomes ggus()
geom_polygon() + # becomes geom_polygon state
aes(x = long, # becomes state = region
y = lat, # becomes state = region
group = group, # becomes state = region
fill = population/1000000) +
coord_map(projection = "orthographic",
orientation = c(41, -74, 0))
## Joining with `by = join_by(region)`
## Warning in right_join(., MainStates): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 1 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
## warning.
library(maps)
MainStates <- map_data("state")
# AllCounty <- map_data("county")
# MainCities <- filter(us.cities, long>=-130)
MainStates %>%
rename(state = region) %>%
rename(which_polygon = group) ->
continuous_states
continuous_states %>%
count(state, which_polygon) %>%
sample_n(10)
## state which_polygon n
## 1 rhode island 46 66
## 2 new york 35 290
## 3 new mexico 33 78
## 4 ohio 42 238
## 5 texas 50 1088
## 6 utah 51 59
## 7 wisconsin 62 388
## 8 delaware 7 94
## 9 new jersey 32 205
## 10 oklahoma 43 284
StatePopulation <- read.csv("https://raw.githubusercontent.com/ds4stats/r-tutorials/master/intro-maps/data/StatePopulation.csv", as.is = TRUE)
compute_state <- function(data, scales){
data %>%
inner_join(continuous_states, multiple = "all") %>%
mutate(x = long, y = lat) %>%
arrange(order) %>%
# mutate(group = paste(group, ".", which_polygon)) %>%
# group_by(which_polygon) %>%
data.frame()
}
StatePopulation %>%
rename(state = region) %>%
mutate(group = row_number()) %>%
compute_state() ->
computed
## Joining with `by = join_by(state)`
computed %>% sample_n(30)
## state population elect_votes group long lat
## 1 new york 19746227 29 32 -76.19193 44.02608
## 2 california 38802500 55 5 -120.40134 34.46341
## 3 texas 26956958 38 43 -100.13583 28.14942
## 4 illinois 12880580 20 13 -88.45322 37.13340
## 5 texas 26956958 38 43 -97.07624 28.20671
## 6 florida 19893297 29 9 -86.37339 30.43552
## 7 texas 26956958 38 43 -96.41161 28.74529
## 8 new york 19746227 29 32 -76.97688 43.25258
## 9 west virginia 1850326 5 48 -80.62089 40.50239
## 10 minnesota 5457173 10 23 -92.68738 48.47796
## 11 new jersey 8938175 14 30 -74.11782 39.94662
## 12 tennessee 6549352 11 42 -83.42838 35.62078
## 13 west virginia 1850326 5 48 -80.55787 40.63990
## 14 new jersey 8938175 14 30 -75.39552 39.36220
## 15 south dakota 853175 3 41 -97.59190 42.84005
## 16 ohio 11594163 18 35 -81.86994 38.90384
## 17 texas 26956958 38 43 -97.59190 27.24987
## 18 illinois 12880580 20 13 -89.38715 37.08183
## 19 florida 19893297 29 9 -82.80959 29.16928
## 20 illinois 12880580 20 13 -89.42152 37.37977
## 21 wisconsin 5757564 10 49 -87.50784 44.19223
## 22 georgia 10097343 16 10 -83.27369 34.61811
## 23 maryland 5976407 10 20 -77.06283 38.99551
## 24 pennsylvania 12787209 20 38 -75.73356 39.78046
## 25 north carolina 9943964 15 33 -76.41537 35.97602
## 26 south dakota 853175 3 41 -96.83559 45.59025
## 27 wyoming 584153 3 50 -104.06059 41.39620
## 28 idaho 1634464 4 12 -111.43456 44.72509
## 29 texas 26956958 38 43 -97.76379 27.41603
## 30 west virginia 1850326 5 48 -77.75610 39.27626
## which_polygon order subregion x y
## 1 35 9173 main -76.19193 44.02608
## 2 4 839 <NA> -120.40134 34.46341
## 3 50 12822 <NA> -100.13583 28.14942
## 4 12 3075 <NA> -88.45322 37.13340
## 5 50 12546 <NA> -97.07624 28.20671
## 6 9 2195 <NA> -86.37339 30.43552
## 7 50 12463 <NA> -96.41161 28.74529
## 8 35 9132 main -76.97688 43.25258
## 9 61 15125 <NA> -80.62089 40.50239
## 10 25 7209 <NA> -92.68738 48.47796
## 11 32 8823 <NA> -74.11782 39.94662
## 12 49 12074 <NA> -83.42838 35.62078
## 13 61 15130 <NA> -80.55787 40.63990
## 14 32 8887 <NA> -75.39552 39.36220
## 15 48 11786 <NA> -97.59190 42.84005
## 16 42 10502 <NA> -81.86994 38.90384
## 17 50 12636 <NA> -97.59190 27.24987
## 18 12 3105 <NA> -89.38715 37.08183
## 19 9 1980 <NA> -82.80959 29.16928
## 20 12 3116 <NA> -89.42152 37.37977
## 21 62 15486 <NA> -87.50784 44.19223
## 22 10 2615 <NA> -83.27369 34.61811
## 23 19 5900 <NA> -77.06283 38.99551
## 24 45 11320 <NA> -75.73356 39.78046
## 25 39 9911 main -76.41537 35.97602
## 26 48 11853 <NA> -96.83559 45.59025
## 27 63 15588 <NA> -104.06059 41.39620
## 28 11 2922 <NA> -111.43456 44.72509
## 29 50 12623 <NA> -97.76379 27.41603
## 30 61 14854 <NA> -77.75610 39.27626
computed %>%
ggplot() +
aes(x = x, y = y) +
geom_point() +
geom_polygon(color = "white") +
aes(group = which_polygon)
my_setup_data <- function(data, params){
if(data$group[1] == -1){
nrows <- nrow(data)
data$group <- seq_len(nrows)
}
data
}
StatPolygonstate <- ggplot2::ggproto(`_class` = "StatPolygonstate",
`_inherit` = ggplot2::Stat,
required_aes = c("state"),
# setup_data = my_setup_data,
compute_panel = compute_state,
default_aes = aes(group = after_stat(which_polygon))
)
geom_polygon_state <- function(
mapping = NULL,
data = NULL,
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
ggplot2::layer(
stat = StatPolygonstate, # proto object from step 2
geom = ggplot2::GeomPolygon, # inherit other behavior
data = data,
mapping = mapping,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
StatePopulation %>% #count(region)
mutate(region = region %>% stringr::str_trim() %>% as.factor()) %>%
ggplot(data = .) +
aes(state = region, # state indicates position instead of x and y
fill = population) +
geom_polygon_state() +
coord_map(projection = "orthographic",
orientation = c(41, -74, 0)) +
geom_polygon_state(data = . %>%
filter(region == "new york"),
color = "red" ) +
geom_polygon_state(data = . %>%
filter(region == "colorado"),
color = "green" ) +
scale_fill_viridis_c(option = "magma", end = .9) +
ggstamp::stamp_point(y = 30.2, x = -97.7,
color = "goldenrod1",
size = 2)
## Joining with `by = join_by(state)`
## Joining with `by = join_by(state)`
## Joining with `by = join_by(state)`
AllCounty <- map_data("county")
ggplot() + geom_polygon(data=AllCounty, aes(x=long, y=lat, group=group),
color="darkblue", fill="lightblue", size = .1 ) +
geom_polygon( data=MainStates, aes(x=long, y=lat, group=group),
color="black", fill="lightblue", size = .15, alpha = .3)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
https://evamaerey.github.io/mytidytuesday/2022-05-28-circle-pack/circle_pack_stipple.html