ols

# ols
## made with flipbookr and xaringan
### Gina Reynolds, Jan 2020

---

```r
cars %>%  
  ggplot() +  
  labs(title = "Fitting an OLS model to data in ggplot2") +  
  aes(x = speed) +  
  aes(y = dist) +  
  geom_point(color = "steelblue",  
             size = 2) +  
  geom_smooth(method = lm, se = F)  
```
]]
.column[.content[

```
`geom_smooth()` using formula 'y ~ x'
```

![](ols_files/figure-html/basic_1_1_output-1.png)
]]

---

OLS in base R

---

```r
# fitting a model
*lm(formula = dist ~ speed,
*  data = cars)
```
]]
.column[.content[

```

Call:
lm(formula = dist ~ speed, data = cars)

Coefficients:
(Intercept)        speed  
    -17.579        3.932  
```
]]
---
class: split-40
count: false

```r
# fitting a model
lm(formula = dist ~ speed,  
   data = cars) ->  
*cars_model
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
# fitting a model
lm(formula = dist ~ speed,  
   data = cars) ->  
cars_model  
*# using summary() to  # using summary() to
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
# fitting a model
lm(formula = dist ~ speed,  
   data = cars) ->  
cars_model  
# using summary() to  # using summary() to
# to pull out more info
*cars_model
```
]]
.column[.content[

```

Call:
lm(formula = dist ~ speed, data = cars)

Coefficients:
(Intercept)        speed  
    -17.579        3.932  
```
]]
---
class: split-40
count: false

```r
# fitting a model
lm(formula = dist ~ speed,  
   data = cars) ->  
cars_model  
# using summary() to  # using summary() to
# to pull out more info
cars_model %>%  
* summary()
```
]]
.column[.content[

```

Call:
lm(formula = dist ~ speed, data = cars)

Residuals:
    Min      1Q  Median      3Q     Max 
-29.069  -9.525  -2.272   9.215  43.201

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -17.5791     6.7584  -2.601   0.0123 *  
speed         3.9324     0.4155   9.464 1.49e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared:  0.6511,	Adjusted R-squared:  0.6438 
F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12
```
]]
---
class: split-40
count: false

```r
# fitting a model
lm(formula = dist ~ speed,  
   data = cars) ->  
cars_model  
# using summary() to  # using summary() to
# to pull out more info
cars_model %>%  
  summary() %>%  
* .$residuals
```
]]
.column[.content[

```
         1          2          3          4          5          6          7 
  3.849460  11.849460  -5.947766  12.052234   2.119825  -7.812584  -3.744993 
         8          9         10         11         12         13         14 
  4.255007  12.255007  -8.677401   2.322599 -15.609810  -9.609810  -5.609810 
        15         16         17         18         19         20         21 
 -1.609810  -7.542219   0.457781   0.457781  12.457781 -11.474628  -1.474628 
        22         23         24         25         26         27         28 
 22.525372  42.525372 -21.407036 -15.407036  12.592964 -13.339445  -5.339445 
        29         30         31         32         33         34         35 
-17.271854  -9.271854   0.728146 -11.204263   2.795737  22.795737  30.795737 
        36         37         38         39         40         41         42 
-21.136672 -11.136672  10.863328 -29.069080 -13.069080  -9.069080  -5.069080 
        43         44         45         46         47         48         49 
  2.930920  -2.933898 -18.866307  -6.798715  15.201285  16.201285  43.201285 
        50 
  4.268876 
```
]]

---

# But {broom} is cool ...

...anticipating that we might want lots of this info pulled out systematically as data frames

Broom returns dataframes at 3 levels

- broom::glance(), overall model statistics
- broom::tidy(), covariate level model statistics
- broom::augment(), observation level statistics

---

# *Model* level stats

```r
cars_model %>% 
  broom::glance()
```

```
# A tibble: 1 x 11
 r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
 <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.651 0.644 15.4 89.6 1.49e-12 2 -207. 419. 425.
# … with 2 more variables: deviance <dbl>, df.residual <int>
```

---

# *Covariate* level stats

```r
cars_model %>% 
  broom::tidy()
```

```
# A tibble: 2 x 5
 term estimate std.error statistic p.value
 <chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -17.6 6.76 -2.60 1.23e- 2
2 speed 3.93 0.416 9.46 1.49e-12
```

---

# *Observation* level

```r
cars_model %>% 
  broom::augment() 
```

```
# A tibble: 50 x 9
 dist speed .fitted .se.fit .resid .hat .sigma .cooksd .std.resid
 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 2 4 -1.85 5.21 3.85 0.115 15.5 0.00459 0.266
 2 10 4 -1.85 5.21 11.8 0.115 15.4 0.0435 0.819
 3 4 7 9.95 4.11 -5.95 0.0715 15.5 0.00620 -0.401
 4 22 7 9.95 4.11 12.1 0.0715 15.4 0.0255 0.813
 5 16 8 13.9 3.77 2.12 0.0600 15.5 0.000645 0.142
 6 10 9 17.8 3.44 -7.81 0.0499 15.5 0.00713 -0.521
 7 18 10 21.7 3.12 -3.74 0.0413 15.5 0.00133 -0.249
 8 26 10 21.7 3.12 4.26 0.0413 15.5 0.00172 0.283
 9 34 10 21.7 3.12 12.3 0.0413 15.4 0.0143 0.814
10 17 11 25.7 2.84 -8.68 0.0341 15.5 0.00582 -0.574
# … with 40 more rows
```

---

`broom::augment()` is good for plotting those residuals and fitted values, and checking model assumptions.

---

```r
*cars_model
```
]]
.column[.content[

```

Call:
lm(formula = dist ~ speed, data = cars)

Coefficients:
(Intercept)        speed  
    -17.579        3.932  
```
]]
---
class: split-40
count: false

```r
cars_model %>%  
* broom::augment()
```
]]
.column[.content[

```r
cars_model %>% 
 broom::augment() %>% 
* ggplot()
```
]]
.column[.content[
![](ols_files/figure-html/augment_auto_3_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 ggplot() + 
* labs(title = "Plot of model fit")
```
]]
.column[.content[
![](ols_files/figure-html/augment_auto_4_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 ggplot() + 
 labs(title = "Plot of model fit") + 
* aes(x = speed)
```
]]
.column[.content[
![](ols_files/figure-html/augment_auto_5_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 ggplot() + 
 labs(title = "Plot of model fit") + 
 aes(x = speed) + 
* aes(y = dist)
```
]]
.column[.content[
![](ols_files/figure-html/augment_auto_6_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 ggplot() + 
 labs(title = "Plot of model fit") + 
 aes(x = speed) + 
 aes(y = dist) + 
* geom_point(col = "steelblue")
```
]]
.column[.content[
![](ols_files/figure-html/augment_auto_7_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>%  
  broom::augment() %>%  
  ggplot() +  
  labs(title = "Plot of model fit") +  
  aes(x = speed) +  
  aes(y = dist) +  
  geom_point(col = "steelblue") +  
* geom_smooth(method = lm, se = F)
```
]]
.column[.content[

```
`geom_smooth()` using formula 'y ~ x'
```

![](ols_files/figure-html/augment_auto_8_output-1.png)
]]
---
class: split-40
count: false

```
`geom_smooth()` using formula 'y ~ x'
```

![](ols_files/figure-html/augment_auto_9_output-1.png)
]]
---
class: split-40
count: false

```
`geom_smooth()` using formula 'y ~ x'
```

![](ols_files/figure-html/augment_auto_10_output-1.png)
]]
---
class: split-40
count: false

```
`geom_smooth()` using formula 'y ~ x'
```

![](ols_files/figure-html/augment_auto_11_output-1.png)
]]
---
class: split-40
count: false

```
`geom_smooth()` using formula 'y ~ x'
```

![](ols_files/figure-html/augment_auto_12_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>%  
  broom::augment() %>%  
  ggplot() +  
  labs(title = "Plot of model fit") +  
  aes(x = speed) +  
  aes(y = dist) +  
  geom_point(col = "steelblue") +  
  geom_smooth(method = lm, se = F) +  
  geom_point(aes(y = .fitted)) +  
  aes(xend = speed) +  
  aes(yend = .fitted) +  
  geom_segment(color = "red",  
               linetype = "dashed") +  
# input value of 21 miles per hour
* geom_vline(xintercept = 21,
*            linetype = "dotted")
```
]]
.column[.content[

```
`geom_smooth()` using formula 'y ~ x'
```

![](ols_files/figure-html/augment_auto_13_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>%  
  broom::augment() %>%  
  ggplot() +  
  labs(title = "Plot of model fit") +  
  aes(x = speed) +  
  aes(y = dist) +  
  geom_point(col = "steelblue") +  
  geom_smooth(method = lm, se = F) +  
  geom_point(aes(y = .fitted)) +  
  aes(xend = speed) +  
  aes(yend = .fitted) +  
  geom_segment(color = "red",  
               linetype = "dashed") +  
# input value of 21 miles per hour
  geom_vline(xintercept = 21,  
             linetype = "dotted") +  
# yields predicted for y, stopping distance
* geom_hline(yintercept =
*              predict(cars_model,
*                      data.frame(speed = 21)),
*            linetype = "dotted")
```
]]
.column[.content[

```
`geom_smooth()` using formula 'y ~ x'
```

![](ols_files/figure-html/augment_auto_14_output-1.png)
]]

---

Checking assumption:

residuals not correlated with predictions...

---

```r
*cars_model
```
]]
.column[.content[

```

Call:
lm(formula = dist ~ speed, data = cars)

Coefficients:
(Intercept)        speed  
    -17.579        3.932  
```
]]
---
class: split-40
count: false

```r
cars_model %>%  
* broom::augment()
```
]]
.column[.content[

```r
cars_model %>% 
 broom::augment() %>% 
* ggplot()
```
]]
.column[.content[
![](ols_files/figure-html/residuals_plot_auto_3_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 ggplot() + 
* labs(title = "Residuals v. Fitted")
```
]]
.column[.content[
![](ols_files/figure-html/residuals_plot_auto_4_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 ggplot() + 
 labs(title = "Residuals v. Fitted") + 
* aes(x = .fitted)
```
]]
.column[.content[
![](ols_files/figure-html/residuals_plot_auto_5_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 ggplot() + 
 labs(title = "Residuals v. Fitted") + 
 aes(x = .fitted) + 
* aes(y = .resid)
```
]]
.column[.content[
![](ols_files/figure-html/residuals_plot_auto_6_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 ggplot() + 
 labs(title = "Residuals v. Fitted") + 
 aes(x = .fitted) + 
 aes(y = .resid) + 
* geom_point(col = "steelblue")
```
]]
.column[.content[
![](ols_files/figure-html/residuals_plot_auto_7_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 ggplot() + 
 labs(title = "Residuals v. Fitted") + 
 aes(x = .fitted) + 
 aes(y = .resid) + 
 geom_point(col = "steelblue") + 
* geom_hline(yintercept = 0,
* lty = "dashed")
```
]]
.column[.content[
![](ols_files/figure-html/residuals_plot_auto_8_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>%  
  broom::augment() %>%  
  ggplot() +  
  labs(title = "Residuals v. Fitted") +  
  aes(x = .fitted) +  
  aes(y = .resid) +  
  geom_point(col = "steelblue") +  
  geom_hline(yintercept = 0,  
             lty = "dashed") +  
* geom_smooth(se = F, span = 1,
*             color = "red")
```
]]
.column[.content[

```
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
```

![](ols_files/figure-html/residuals_plot_auto_9_output-1.png)
]]

---

# Checking for outliers with Cook's distance

---

```r
*cars_model
```
]]
.column[.content[

```

Call:
lm(formula = dist ~ speed, data = cars)

Coefficients:
(Intercept)        speed  
    -17.579        3.932  
```
]]
---
class: split-40
count: false

```r
cars_model %>%  
* broom::augment()
```
]]
.column[.content[

```r
cars_model %>%  
  broom::augment() %>%  
* mutate(id = row_number())
```
]]
.column[.content[

```
# A tibble: 50 x 10
 dist speed .fitted .se.fit .resid .hat .sigma .cooksd .std.resid id
 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
 1 2 4 -1.85 5.21 3.85 0.115 15.5 0.00459 0.266 1
 2 10 4 -1.85 5.21 11.8 0.115 15.4 0.0435 0.819 2
 3 4 7 9.95 4.11 -5.95 0.0715 15.5 0.00620 -0.401 3
 4 22 7 9.95 4.11 12.1 0.0715 15.4 0.0255 0.813 4
 5 16 8 13.9 3.77 2.12 0.0600 15.5 0.000645 0.142 5
 6 10 9 17.8 3.44 -7.81 0.0499 15.5 0.00713 -0.521 6
 7 18 10 21.7 3.12 -3.74 0.0413 15.5 0.00133 -0.249 7
 8 26 10 21.7 3.12 4.26 0.0413 15.5 0.00172 0.283 8
 9 34 10 21.7 3.12 12.3 0.0413 15.4 0.0143 0.814 9
10 17 11 25.7 2.84 -8.68 0.0341 15.5 0.00582 -0.574 10
# … with 40 more rows
```
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(id = row_number()) %>% 
* ggplot()
```
]]
.column[.content[
![](ols_files/figure-html/cooks_auto_4_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(id = row_number()) %>% 
 ggplot() + 
* labs(title = "Cook's Distance, by ID (row number)")
```
]]
.column[.content[
![](ols_files/figure-html/cooks_auto_5_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(id = row_number()) %>% 
 ggplot() + 
 labs(title = "Cook's Distance, by ID (row number)") + 
* labs(subtitle = "Cook's Distance, a measure of influence via a leave one out procedure")
```
]]
.column[.content[
![](ols_files/figure-html/cooks_auto_6_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(id = row_number()) %>% 
 ggplot() + 
 labs(title = "Cook's Distance, by ID (row number)") + 
 labs(subtitle = "Cook's Distance, a measure of influence via a leave one out procedure") + 
* aes(x = id)
```
]]
.column[.content[
![](ols_files/figure-html/cooks_auto_7_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(id = row_number()) %>% 
 ggplot() + 
 labs(title = "Cook's Distance, by ID (row number)") + 
 labs(subtitle = "Cook's Distance, a measure of influence via a leave one out procedure") + 
 aes(x = id) + 
* aes(y = .cooksd)
```
]]
.column[.content[
![](ols_files/figure-html/cooks_auto_8_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(id = row_number()) %>% 
 ggplot() + 
 labs(title = "Cook's Distance, by ID (row number)") + 
 labs(subtitle = "Cook's Distance, a measure of influence via a leave one out procedure") + 
 aes(x = id) + 
 aes(y = .cooksd) + 
* geom_point()
```
]]
.column[.content[
![](ols_files/figure-html/cooks_auto_9_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(id = row_number()) %>% 
 ggplot() + 
 labs(title = "Cook's Distance, by ID (row number)") + 
 labs(subtitle = "Cook's Distance, a measure of influence via a leave one out procedure") + 
 aes(x = id) + 
 aes(y = .cooksd) + 
 geom_point() + 
* aes(label = id)
```
]]
.column[.content[
![](ols_files/figure-html/cooks_auto_10_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(id = row_number()) %>% 
 ggplot() + 
 labs(title = "Cook's Distance, by ID (row number)") + 
 labs(subtitle = "Cook's Distance, a measure of influence via a leave one out procedure") + 
 aes(x = id) + 
 aes(y = .cooksd) + 
 geom_point() + 
 aes(label = id) + 
* geom_text(check_overlap = T,
* nudge_y = .015)
```
]]
.column[.content[
![](ols_files/figure-html/cooks_auto_11_output-1.png)
]]

---

Are the residuals normally distributed?

---

```r
*cars_model
```
]]
.column[.content[

```

Call:
lm(formula = dist ~ speed, data = cars)

Coefficients:
(Intercept)        speed  
    -17.579        3.932  
```
]]
---
class: split-40
count: false

```r
cars_model %>%  
* broom::augment()
```
]]
.column[.content[

```r
cars_model %>%  
  broom::augment() %>%  
* mutate(expected = qnorm((1:n() - .5)/n()))
```
]]
.column[.content[

```
# A tibble: 50 x 10
 dist speed .fitted .se.fit .resid .hat .sigma .cooksd .std.resid expected
 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 2 4 -1.85 5.21 3.85 0.115 15.5 0.00459 0.266 -2.33 
 2 10 4 -1.85 5.21 11.8 0.115 15.4 0.0435 0.819 -1.88 
 3 4 7 9.95 4.11 -5.95 0.0715 15.5 0.00620 -0.401 -1.64 
 4 22 7 9.95 4.11 12.1 0.0715 15.4 0.0255 0.813 -1.48 
 5 16 8 13.9 3.77 2.12 0.0600 15.5 0.000645 0.142 -1.34 
 6 10 9 17.8 3.44 -7.81 0.0499 15.5 0.00713 -0.521 -1.23 
 7 18 10 21.7 3.12 -3.74 0.0413 15.5 0.00133 -0.249 -1.13 
 8 26 10 21.7 3.12 4.26 0.0413 15.5 0.00172 0.283 -1.04 
 9 34 10 21.7 3.12 12.3 0.0413 15.4 0.0143 0.814 -0.954
10 17 11 25.7 2.84 -8.68 0.0341 15.5 0.00582 -0.574 -0.878
# … with 40 more rows
```
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(expected = qnorm((1:n() - .5)/n())) %>% 
* ggplot()
```
]]
.column[.content[
![](ols_files/figure-html/qqplot_auto_4_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(expected = qnorm((1:n() - .5)/n())) %>% 
 ggplot() + 
* labs(title = "Normal q-q plot")
```
]]
.column[.content[
![](ols_files/figure-html/qqplot_auto_5_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(expected = qnorm((1:n() - .5)/n())) %>% 
 ggplot() + 
 labs(title = "Normal q-q plot") + 
* aes(y = sort(.std.resid))
```
]]
.column[.content[
![](ols_files/figure-html/qqplot_auto_6_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(expected = qnorm((1:n() - .5)/n())) %>% 
 ggplot() + 
 labs(title = "Normal q-q plot") + 
 aes(y = sort(.std.resid)) + 
* aes(x = expected)
```
]]
.column[.content[
![](ols_files/figure-html/qqplot_auto_7_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(expected = qnorm((1:n() - .5)/n())) %>% 
 ggplot() + 
 labs(title = "Normal q-q plot") + 
 aes(y = sort(.std.resid)) + 
 aes(x = expected) + 
* geom_rug()
```
]]
.column[.content[
![](ols_files/figure-html/qqplot_auto_8_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(expected = qnorm((1:n() - .5)/n())) %>% 
 ggplot() + 
 labs(title = "Normal q-q plot") + 
 aes(y = sort(.std.resid)) + 
 aes(x = expected) + 
 geom_rug() + 
* geom_point()
```
]]
.column[.content[
![](ols_files/figure-html/qqplot_auto_9_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(expected = qnorm((1:n() - .5)/n())) %>% 
 ggplot() + 
 labs(title = "Normal q-q plot") + 
 aes(y = sort(.std.resid)) + 
 aes(x = expected) + 
 geom_rug() + 
 geom_point() + 
* coord_equal(ratio = 1,
* xlim = c(-4, 4),
* ylim = c(-4, 4))
```
]]
.column[.content[
![](ols_files/figure-html/qqplot_auto_10_output-1.png)
]]
---
class: split-40
count: false

```r
cars_model %>% 
 broom::augment() %>% 
 mutate(expected = qnorm((1:n() - .5)/n())) %>% 
 ggplot() + 
 labs(title = "Normal q-q plot") + 
 aes(y = sort(.std.resid)) + 
 aes(x = expected) + 
 geom_rug() + 
 geom_point() + 
 coord_equal(ratio = 1, 
 xlim = c(-4, 4), 
 ylim = c(-4, 4)) + 
# add line of equivilance
* geom_abline(intercept = 0,
* slope = 1,
* linetype = "dotted")
```
]]
.column[.content[
![](ols_files/figure-html/qqplot_auto_11_output-1.png)
]]

---

```r
cars_model
```

```

Call:
lm(formula = dist ~ speed, data = cars)

Coefficients:
(Intercept)        speed  
    -17.579        3.932  
```

---

# multiple regression...

--
...and multiple models in a single table

---

# subsetting to 2002

```r
library(gapminder)
gapminder %>% 
  filter(year == 2002) ->
gapminder_2002
```

---

```r
*lm(lifeExp ~ gdpPercap,
*  data = gapminder_2002)
```
]]
.column[.content[

```

Call:
lm(formula = lifeExp ~ gdpPercap, data = gapminder_2002)

Coefficients:
(Intercept)    gdpPercap  
  5.825e+01    7.507e-04  
```
]]
---
class: split-40
count: false

```r
lm(lifeExp ~ gdpPercap,  
   data = gapminder_2002) ->  
*model1
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
lm(lifeExp ~ gdpPercap,  
   data = gapminder_2002) ->  
model1  
*lm(lifeExp ~ gdpPercap + pop,
*  data = gapminder_2002)
```
]]
.column[.content[

```

Call:
lm(formula = lifeExp ~ gdpPercap + pop, data = gapminder_2002)

Coefficients:
(Intercept)    gdpPercap          pop  
  5.789e+01    7.554e-04    7.516e-09  
```
]]
---
class: split-40
count: false

```r
lm(lifeExp ~ gdpPercap,  
   data = gapminder_2002) ->  
model1  
lm(lifeExp ~ gdpPercap + pop,  
   data = gapminder_2002) ->  
*model2
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
lm(lifeExp ~ gdpPercap,  
   data = gapminder_2002) ->  
model1  
lm(lifeExp ~ gdpPercap + pop,  
   data = gapminder_2002) ->  
model2  
*lm(lifeExp ~ gdpPercap +
*    continent,
*  data = gapminder_2002)
```
]]
.column[.content[

```

Call:
lm(formula = lifeExp ~ gdpPercap + continent, data = gapminder_2002)

Coefficients:
      (Intercept)          gdpPercap  continentAmericas      continentAsia  
        5.224e+01          4.168e-04          1.631e+01          1.275e+01  
  continentEurope   continentOceania  
        1.541e+01          1.627e+01  
```
]]
---
class: split-40
count: false

]]
---
class: split-40
count: false

```r
lm(lifeExp ~ gdpPercap,  
   data = gapminder_2002) ->  
model1  
lm(lifeExp ~ gdpPercap + pop,  
   data = gapminder_2002) ->  
model2  
lm(lifeExp ~ gdpPercap +  
     continent,  
   data = gapminder_2002) ->  
model3  
#
*stargazer::stargazer(
* model1, model2, model3,
# if you want to work interactivevly, change "html" to "text"
# note, chunk results option is "asis"
* type = "text",
* dep.var.labels = "Life Expectancy (years)",
* covariate.labels = c("GDP per cap (Thousands $US)",
*                      "Population (Millions)"),
* style = "qje",
* font.size = "small",
* title = "Models of Life Expectancy in 2002")
```
]]
.column[.content[

```

Models of Life Expectancy in 2002
===================================================================================================
                                                   Life Expectancy (years)                         
                                     (1)                      (2)                     (3)          
---------------------------------------------------------------------------------------------------
GDP per cap (Thousands US)         0.001***                0.001***                0.0004***       
                                   (0.0001)                (0.0001)                (0.0001)        
                                                                                                   
Population (Millions)                                        0.000                                 
                                                            (0.000)                                
                                                                                                   
continentAmericas                                                                  16.309***       
                                                                                    (1.676)        
                                                                                                   
continentAsia                                                                      12.751***       
                                                                                    (1.561)        
                                                                                                   
continentEurope                                                                    15.409***       
                                                                                    (1.982)        
                                                                                                   
continentOceania                                                                   16.270***       
                                                                                    (5.051)        
                                                                                                   
Constant                          58.250***                57.891***               52.242***       
                                   (1.014)                  (1.043)                 (0.937)        
                                                                                                   
N                                    142                      142                     142          
R2                                  0.465                    0.472                   0.718         
Adjusted R2                         0.461                    0.465                   0.708         
Residual Std. Error            9.015 (df = 140)        8.984 (df = 139)        6.640 (df = 136)    
F Statistic                121.648*** (df = 1; 140) 62.214*** (df = 2; 139) 69.244*** (df = 5; 136)
===================================================================================================
Notes:                                                       ***Significant at the 1 percent level.
                                                              **Significant at the 5 percent level.
                                                              *Significant at the 10 percent level.
```
]]
---
class: split-40
count: false

```r
lm(lifeExp ~ gdpPercap,  
   data = gapminder_2002) ->  
model1  
lm(lifeExp ~ gdpPercap + pop,  
   data = gapminder_2002) ->  
model2  
lm(lifeExp ~ gdpPercap +  
     continent,  
   data = gapminder_2002) ->  
model3  
#
stargazer::stargazer(  
  model1, model2, model3,  
# if you want to work interactivevly, change "html" to "text"
# note, chunk results option is "asis"
  type = "text",  
  dep.var.labels = "Life Expectancy (years)",  
  covariate.labels = c("GDP per cap (Thousands $US)",  
                       "Population (Millions)"),  
  style = "qje",  
  font.size = "small",  
  title = "Models of Life Expectancy in 2002") ->  
*formatted_models_table
```
]]
.column[.content[

```

---

if you change they `type` argument to "html", and the code chunk option to results = "asis"

<table style="text-align:center"><caption>Models of Life Expectancy in 2002</caption>
<tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="3">Life Expectancy (years)</td></tr>
<tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td></tr>
<tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">GDP per cap (Thousands US)</td><td>0.001***</td><td>0.001***</td><td>0.0004***</td></tr>
<tr><td style="text-align:left"></td><td>(0.0001)</td><td>(0.0001)</td><td>(0.0001)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">Population (Millions)</td><td></td><td>0.000</td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td>(0.000)</td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">continentAmericas</td><td></td><td></td><td>16.309***</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td>(1.676)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">continentAsia</td><td></td><td></td><td>12.751***</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td>(1.561)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">continentEurope</td><td></td><td></td><td>15.409***</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td>(1.982)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">continentOceania</td><td></td><td></td><td>16.270***</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td>(5.051)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">Constant</td><td>58.250***</td><td>57.891***</td><td>52.242***</td></tr>
<tr><td style="text-align:left"></td><td>(1.014)</td><td>(1.043)</td><td>(0.937)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">N</td><td>142</td><td>142</td><td>142</td></tr>
<tr><td style="text-align:left">R2</td><td>0.465</td><td>0.472</td><td>0.718</td></tr>
<tr><td style="text-align:left">Adjusted R2</td><td>0.461</td><td>0.465</td><td>0.708</td></tr>
<tr><td style="text-align:left">Residual Std. Error</td><td>9.015 (df = 140)</td><td>8.984 (df = 139)</td><td>6.640 (df = 136)</td></tr>
<tr><td style="text-align:left">F Statistic</td><td>121.648*** (df = 1; 140)</td><td>62.214*** (df = 2; 139)</td><td>69.244*** (df = 5; 136)</td></tr>
<tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Notes:</td><td colspan="3" style="text-align:right">***Significant at the 1 percent level.</td></tr>
<tr><td style="text-align:left"></td><td colspan="3" style="text-align:right">**Significant at the 5 percent level.</td></tr>
<tr><td style="text-align:left"></td><td colspan="3" style="text-align:right">*Significant at the 10 percent level.</td></tr>
</table>

---

# the maths!

---

```r
# create artificial data set
#  set seed for reproducibility
*set.seed(501)
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
# create artificial data set
#  set seed for reproducibility
set.seed(501)

#  head_size in cm
*rnorm(n = 20, mean = 30, sd = 10)
```
]]
.column[.content[

```
 [1] 35.77281 36.17452 34.52007 32.25781 21.55195 15.66631 13.51793 30.35623
 [9] 31.84019 42.23762 30.87002 39.05107 32.37906 34.65357 39.27268 28.10562
[17] 28.13692 40.97114 24.25709 36.85942
```
]]
---
class: split-40
count: false

```r
# create artificial data set
#  set seed for reproducibility
set.seed(501)

#  head_size in cm
rnorm(n = 20, mean = 30, sd = 10) ->  
* head_size
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
# create artificial data set
#  set seed for reproducibility
set.seed(501)

#  head_size in cm
rnorm(n = 20, mean = 30, sd = 10) ->  
  head_size

#  tail_length in kilogram the relationship between head_size
#  and tail_length is completely random
*head_size
```
]]
.column[.content[

```r
# create artificial data set
#  set seed for reproducibility
set.seed(501)

#  head_size in cm
rnorm(n = 20, mean = 30, sd = 10) ->  
  head_size

#  tail_length in kilogram the relationship between head_size
#  and tail_length is completely random
head_size *  
* rnorm(20, 0, sd = 15)
```
]]
.column[.content[

```
 [1]  617.94937  374.08845   78.60973  220.43739  -90.05541  -67.70514
 [7]  212.59042 -236.91309 -630.04145  182.98154  288.24508  290.78784
[13]  267.12589  419.40205  878.78295  218.98375 -115.72531    5.63956
[19] -143.70412  -83.67543
```
]]
---
class: split-40
count: false

```r
# create artificial data set
#  set seed for reproducibility
set.seed(501)

#  head_size in cm
rnorm(n = 20, mean = 30, sd = 10) ->  
  head_size

#  tail_length in kilogram the relationship between head_size
#  and tail_length is completely random
head_size *  
  rnorm(20, 0, sd = 15) +  
* 80
```
]]
.column[.content[

```
 [1]  697.949367  454.088447  158.609728  300.437392  -10.055414   12.294857
 [7]  292.590425 -156.913095 -550.041452  262.981543  368.245079  370.787842
[13]  347.125890  499.402052  958.782945  298.983754  -35.725305   85.639560
[19]  -63.704121   -3.675432
```
]]
---
class: split-40
count: false

```r
# create artificial data set
#  set seed for reproducibility
set.seed(501)

#  head_size in cm
rnorm(n = 20, mean = 30, sd = 10) ->  
  head_size

#  tail_length in kilogram the relationship between head_size
#  and tail_length is completely random
head_size *  
  rnorm(20, 0, sd = 15) +  
  80  ->  
*tail_length
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
# create artificial data set
#  set seed for reproducibility
set.seed(501)

#  head_size in cm
rnorm(n = 20, mean = 30, sd = 10) ->  
  head_size

#  tail_length in kilogram the relationship between head_size
#  and tail_length is completely random
head_size *  
  rnorm(20, 0, sd = 15) +  
  80  ->  
tail_length

#  join head_size and tail_length in data frame
*data.frame(head_size, tail_length)
```
]]
.column[.content[

```
   head_size tail_length
1   35.77281  697.949367
2   36.17452  454.088447
3   34.52007  158.609728
4   32.25781  300.437392
5   21.55195  -10.055414
6   15.66631   12.294857
7   13.51793  292.590425
8   30.35623 -156.913095
9   31.84019 -550.041452
10  42.23762  262.981543
11  30.87002  368.245079
12  39.05107  370.787842
13  32.37906  347.125890
14  34.65357  499.402052
15  39.27268  958.782945
16  28.10562  298.983754
17  28.13692  -35.725305
18  40.97114   85.639560
19  24.25709  -63.704121
20  36.85942   -3.675432
```
]]
---
class: split-40
count: false

```r
# create artificial data set
#  set seed for reproducibility
set.seed(501)

#  head_size in cm
rnorm(n = 20, mean = 30, sd = 10) ->  
  head_size

#  tail_length in kilogram the relationship between head_size
#  and tail_length is completely random
head_size *  
  rnorm(20, 0, sd = 15) +  
  80  ->  
tail_length

#  join head_size and tail_length in data frame
data.frame(head_size, tail_length) ->  
* df
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
# create artificial data set
#  set seed for reproducibility
set.seed(501)

#  head_size in cm
rnorm(n = 20, mean = 30, sd = 10) ->  
  head_size

#  tail_length in kilogram the relationship between head_size
#  and tail_length is completely random
head_size *  
  rnorm(20, 0, sd = 15) +  
  80  ->  
tail_length

#  join head_size and tail_length in data frame
data.frame(head_size, tail_length) ->  
  df

*lm(tail_length ~ head_size, data = df)
```
]]
.column[.content[

```

Call:
lm(formula = tail_length ~ head_size, data = df)

Coefficients:
(Intercept)    head_size  
    -214.66        13.65  
```
]]
---
class: split-40
count: false

```r
# create artificial data set
#  set seed for reproducibility
set.seed(501)

#  head_size in cm
rnorm(n = 20, mean = 30, sd = 10) ->  
  head_size

#  tail_length in kilogram the relationship between head_size
#  and tail_length is completely random
head_size *  
  rnorm(20, 0, sd = 15) +  
  80  ->  
tail_length

#  join head_size and tail_length in data frame
data.frame(head_size, tail_length) ->  
  df

lm(tail_length ~ head_size, data = df) %>%  
* summary()
```
]]
.column[.content[

```

Call:
lm(formula = tail_length ~ head_size, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-770.13 -186.51   32.64  164.75  637.21

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -214.656    301.605  -0.712    0.486
head_size     13.654      9.328   1.464    0.160

Residual standard error: 317.9 on 18 degrees of freedom
Multiple R-squared:  0.1064,	Adjusted R-squared:  0.05673 
F-statistic: 2.143 on 1 and 18 DF,  p-value: 0.1605
```
]]
---
class: split-40
count: false

```r
# create artificial data set
#  set seed for reproducibility
set.seed(501)

#  head_size in cm
rnorm(n = 20, mean = 30, sd = 10) ->  
  head_size

#  tail_length in kilogram the relationship between head_size
#  and tail_length is completely random
head_size *  
  rnorm(20, 0, sd = 15) +  
  80  ->  
tail_length

#  join head_size and tail_length in data frame
data.frame(head_size, tail_length) ->  
  df

lm(tail_length ~ head_size, data = df) %>%  
  summary() ->  
*reg
```
]]
.column[.content[

]]

---

beta = ((X'X)^(-1))X'y
res = y - beta1 - beta2*X2
VCV = (1/(n-k))res'res(X'X)^(-1)

---

```r
## build OLS estimator manually

#  define X matrix and y vector
*cbind(1, df$head_size)
```
]]
.column[.content[

```
      [,1]     [,2]
 [1,]    1 35.77281
 [2,]    1 36.17452
 [3,]    1 34.52007
 [4,]    1 32.25781
 [5,]    1 21.55195
 [6,]    1 15.66631
 [7,]    1 13.51793
 [8,]    1 30.35623
 [9,]    1 31.84019
[10,]    1 42.23762
[11,]    1 30.87002
[12,]    1 39.05107
[13,]    1 32.37906
[14,]    1 34.65357
[15,]    1 39.27268
[16,]    1 28.10562
[17,]    1 28.13692
[18,]    1 40.97114
[19,]    1 24.25709
[20,]    1 36.85942
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
* as.matrix()
```
]]
.column[.content[

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
* X
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
*as.matrix(df$tail_length)
```
]]
.column[.content[

```
             [,1]
 [1,]  697.949367
 [2,]  454.088447
 [3,]  158.609728
 [4,]  300.437392
 [5,]  -10.055414
 [6,]   12.294857
 [7,]  292.590425
 [8,] -156.913095
 [9,] -550.041452
[10,]  262.981543
[11,]  368.245079
[12,]  370.787842
[13,]  347.125890
[14,]  499.402052
[15,]  958.782945
[16,]  298.983754
[17,]  -35.725305
[18,]   85.639560
[19,]  -63.704121
[20,]   -3.675432
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
* y
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
*t(X) %*% X
```
]]
.column[.content[

```
        [,1]      [,2]
[1,]  20.000   628.452
[2,] 628.452 20909.292
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
*solve()
```
]]
.column[.content[

```
            [,1]          [,2]
[1,]  0.89994857 -0.0270489556
[2,] -0.02704896  0.0008608121
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
* t(X)
```
]]
.column[.content[

```
             [,1]         [,2]         [,3]         [,4]         [,5]
[1,] -0.067668557 -0.078534496 -0.033783400 0.0274084332  0.316990793
[2,]  0.003744711  0.004090511  0.002666342 0.0007189591 -0.008496775
           [,6]        [,7]          [,8]         [,9]      [,10]         [,11]
[1,]  0.4761911  0.53430277  0.0788442915 0.0387045902 -0.2425350  0.0649468030
[2,] -0.0135632 -0.01541256 -0.0009179473 0.0003594677  0.0093097 -0.0004756704
            [,12]       [,13]        [,14]        [,15]        [,16]
[1,] -0.156341987 0.024128703 -0.037394248 -0.162336382  0.139721007
[2,]  0.006566674 0.000823334  0.002781254  0.006757441 -0.002855302
            [,17]        [,18]        [,19]        [,20]
[1,]  0.138874346 -0.208278049  0.243819622 -0.097060375
[2,] -0.002828357  0.008219499 -0.006168159  0.004680083
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
* y
```
]]
.column[.content[

```
           [,1]
[1,] -214.65587
[2,]   13.65406
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
* beta
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
*y 
```
]]
.column[.content[

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
* beta[1]
```
]]
.column[.content[

```
            [,1]
 [1,]  912.60523
 [2,]  668.74431
 [3,]  373.26560
 [4,]  515.09326
 [5,]  204.60045
 [6,]  226.95072
 [7,]  507.24629
 [8,]   57.74277
 [9,] -335.38559
[10,]  477.63741
[11,]  582.90095
[12,]  585.44371
[13,]  561.78176
[14,]  714.05792
[15,] 1173.43881
[16,]  513.63962
[17,]  178.93056
[18,]  300.29543
[19,]  150.95175
[20,]  210.98043
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
* (beta[2] * X[,2])
```
]]
.column[.content[

```
            [,1]
 [1,]  424.16118
 [2,]  174.81524
 [3,]  -98.07355
 [4,]   74.64318
 [5,]  -89.67117
 [6,]   13.04194
 [7,]  322.67172
 [8,] -356.74297
 [9,] -770.13347
[10,]  -99.07758
[11,]  161.39988
[12,]   52.23814
[13,]  119.67610
[14,]  240.89605
[15,]  637.20733
[16,]  129.88387
[17,] -205.25257
[18,] -259.12698
[19,] -180.25600
[20,] -292.30033
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
*as.matrix()
```
]]
.column[.content[

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
* res
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
*nrow(df)
```
]]
.column[.content[

```
[1] 20
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
* n
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

*ncol(X)
```
]]
.column[.content[

```
[1] 2
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
* k
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

*1/(n - k)
```
]]
.column[.content[

```
[1] 0.05555556
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
* as.numeric(t(res) %*% res)
```
]]
.column[.content[

```
[1] 101078.9
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
* solve(t(X) %*% X)
```
]]
.column[.content[

```
         [,1]        [,2]
[1,] 90965.84 -2734.07953
[2,] -2734.08    87.00997
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
  solve(t(X) %*% X) ->  
*VCV
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
  solve(t(X) %*% X) ->  
VCV

## calculate standard errors se of coefficients
*diag(VCV)
```
]]
.column[.content[

```
[1] 90965.84022    87.00997
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
  solve(t(X) %*% X) ->  
VCV

## calculate standard errors se of coefficients
diag(VCV) %>%  
* sqrt()
```
]]
.column[.content[

```
[1] 301.605438   9.327913
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
  solve(t(X) %*% X) ->  
VCV

## calculate standard errors se of coefficients
diag(VCV) %>%  
  sqrt() ->  
*se
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
  solve(t(X) %*% X) ->  
VCV

## calculate standard errors se of coefficients
diag(VCV) %>%  
  sqrt() ->  
se

## calculate the p-values
*rbind(2*pt(abs(beta[1]/se[1]), df = n - k,
*                     lower.tail = FALSE),
*                2*pt(abs(beta[2]/se[2]), df = n - k,
*                     lower.tail = FALSE))
```
]]
.column[.content[

```
          [,1]
[1,] 0.4857664
[2,] 0.1604981
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
  solve(t(X) %*% X) ->  
VCV

## calculate standard errors se of coefficients
diag(VCV) %>%  
  sqrt() ->  
se

## calculate the p-values
rbind(2*pt(abs(beta[1]/se[1]), df = n - k,  
                      lower.tail = FALSE),  
                 2*pt(abs(beta[2]/se[2]), df = n - k,  
                      lower.tail = FALSE)) ->  
* p_value
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
  solve(t(X) %*% X) ->  
VCV

## calculate standard errors se of coefficients
diag(VCV) %>%  
  sqrt() ->  
se

## calculate the p-values
rbind(2*pt(abs(beta[1]/se[1]), df = n - k,  
                      lower.tail = FALSE),  
                 2*pt(abs(beta[2]/se[2]), df = n - k,  
                      lower.tail = FALSE)) ->  
  p_value

## combine all necessary information
*as.data.frame(cbind(c("(Intercept)","head_size"),
*                               beta,se,p_value))
```
]]
.column[.content[

```
           V1                V2               se                V4
1 (Intercept) -214.655866735562 301.605437988381 0.485766378408867
2   head_size  13.6540591513866 9.32791323596427 0.160498082949983
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
  solve(t(X) %*% X) ->  
VCV

## calculate standard errors se of coefficients
diag(VCV) %>%  
  sqrt() ->  
se

## combine all necessary information
as.data.frame(cbind(c("(Intercept)","head_size"),  
                                beta,se,p_value)) ->  
* output
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
  solve(t(X) %*% X) ->  
VCV

## calculate standard errors se of coefficients
diag(VCV) %>%  
  sqrt() ->  
se

## combine all necessary information
as.data.frame(cbind(c("(Intercept)","head_size"),  
                                beta,se,p_value)) ->  
  output  
*c("Coefficients:","Estimate",
*                  "Std. Error","Pr(>|t|)")
```
]]
.column[.content[

```
[1] "Coefficients:" "Estimate"      "Std. Error"    "Pr(>|t|)"     
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
  solve(t(X) %*% X) ->  
VCV

## calculate standard errors se of coefficients
diag(VCV) %>%  
  sqrt() ->  
se

## combine all necessary information
as.data.frame(cbind(c("(Intercept)","head_size"),  
                                beta,se,p_value)) ->  
  output  
c("Coefficients:","Estimate",  
                   "Std. Error","Pr(>|t|)") ->  
* names(output)
```
]]
.column[.content[

]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
  solve(t(X) %*% X) ->  
VCV

## calculate standard errors se of coefficients
diag(VCV) %>%  
  sqrt() ->  
se

#compare automatic lm and manual output
*output
```
]]
.column[.content[

```
  Coefficients:          Estimate       Std. Error          Pr(>|t|)
1   (Intercept) -214.655866735562 301.605437988381 0.485766378408867
2     head_size  13.6540591513866 9.32791323596427 0.160498082949983
```
]]
---
class: split-40
count: false

```r
## build OLS estimator manually

#  define X matrix and y vector
cbind(1, df$head_size) %>%  
  as.matrix() ->  
  X  
as.matrix(df$tail_length) ->  
  y

#  estimate the coeficients beta
t(X) %*% X %>%  
solve() %*%  
  t(X) %*%  
  y ->  
  beta

## calculate residuals
y -  
  beta[1] -  
  (beta[2] * X[,2]) %>%  
as.matrix() ->  
  res

## define the number of observations and the number of
#  parameters
nrow(df) ->  
  n

ncol(X) ->  
  k

## calculate the Variance-Covariance matrix VCV

1/(n - k) *  
  as.numeric(t(res) %*% res) *  
  solve(t(X) %*% X) ->  
VCV

## calculate standard errors se of coefficients
diag(VCV) %>%  
  sqrt() ->  
se

#compare automatic lm and manual output
output  
*summary(reg)
```
]]
.column[.content[

```
              Length Class  Mode   
call           3     -none- call   
terms          3     terms  call   
residuals     20     -none- numeric
coefficients   8     -none- numeric
aliased        2     -none- logical
sigma          1     -none- numeric
df             3     -none- numeric
r.squared      1     -none- numeric
adj.r.squared  1     -none- numeric
fstatistic     3     -none- numeric
cov.unscaled   4     -none- numeric
```
]]