15  R4DS: Data tidying

To run these solutions you need to load the necessary libraries.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

15.1 6.2 Tidy data

15.1.1 Exercise 1

  • table1
    • Observation: Tuberculosis and population for one country and one year.
    • Columns:
      • country, year, tuberculosis cases, population
  • table2
    • Observation: Tuberculosis or population for one country and one year.
    • Columns:
      • country, year, type of observation, count of the type of observation
  • table3
    • Observation: Tuberculosis per capita as a string for a country and one year.
    • Columns:
      • country, year, tuberculosis per capita as a string

Observe that the rate in table three is stored as a string, which is not the value of tuberculosis per capita.

15.1.2 Exercise 2

Not recommended, but here is a solution for table2, if you are interested. You do not need to be able to reproduce this.

Code
table2 <- tibble(
  country = c("Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Brazil", "Brazil"),
  year = c(1999, 1999, 2000, 2000, 1999, 1999),
  type = c("cases", "population", "cases", "population", "cases", "population"),
  count = c(745, 19987071, 2666, 20595360, 37737, 172006362)
)

cases <- table2 |> 
  filter(type == "cases") |>
  pull(count)
population <- table2 |> 
  filter(type == "population") |>
  pull(count)
rate <- (cases / population) * 10000
table2$rate <- rep(rate, each = 2)
table2
# A tibble: 6 × 5
  country      year type           count  rate
  <chr>       <dbl> <chr>          <dbl> <dbl>
1 Afghanistan  1999 cases            745 0.373
2 Afghanistan  1999 population  19987071 0.373
3 Afghanistan  2000 cases           2666 1.29 
4 Afghanistan  2000 population  20595360 1.29 
5 Brazil       1999 cases          37737 2.19 
6 Brazil       1999 population 172006362 2.19