• Basic operations
    • Question 1
    • Question 2
    • Question 3
    • Question 4
  • Cleaning and counting
    • Question 1
    • Question 2
    • Question 3
    • Question 4
    • Question 5
  • Combining data
    • Question 1
    • Question 2
    • Question 3
    • Question 4
  • Plotting
    • Question 1
    • Question 2
    • Question 3
    • Question 4
    • Question 5
  • Functional programming
    • Question 1
    • Question 2
    • Question 3
    • Question 4
  • Wrapping up
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Basic operations

more-example-exams/#basic-operations

Question 1

Read the file person.csv and store the result in a tibble called person.

person <- read_csv("https://education.rstudio.com/blog/2020/08/more-example-exams/person.csv")
## Parsed with column specification:
## cols(
##   person_id = col_character(),
##   personal_name = col_character(),
##   family_name = col_character()
## )
class(person)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

Question 2

Create a tibble containing only family and personal names, in that order. You do not need to assign this tibble or any others to variables unless explicitly asked to do so. However, as noted in the introduction, you must use the pipe operator %>% and code that follows the tidyverse style guide.

# View(person)

person %>%
  select(family_name, personal_name)
ABCDEFGHIJ0123456789
family_name
<chr>
personal_name
<chr>
DyerWilliam
PabodieFrank
LakeAnderson
RoerichValentina
DanforthFrank

Question 3

Create a new tibble containing only the rows in which family names come before the letter M. Your solution should work for tables with more rows than the example, i.e., you cannot rely on row numbers or select specific names.

person %>%
  arrange(family_name) %>%
  filter(family_name < "M")
ABCDEFGHIJ0123456789
person_id
<chr>
personal_name
<chr>
family_name
<chr>
danforthFrankDanforth
dyerWilliamDyer
lakeAndersonLake

Question 4

Display all the rows in person sorted by family name length with the longest name first.

person %>%
  arrange(desc(str_length(family_name)))
ABCDEFGHIJ0123456789
person_id
<chr>
personal_name
<chr>
family_name
<chr>
danforthFrankDanforth
pbFrankPabodie
roeValentinaRoerich
dyerWilliamDyer
lakeAndersonLake

Cleaning and counting

more-sample-exams/#cleaning-and-counting

Question 1

Read the file measurements.csv to create a tibble called measurements. (The strings “rad”, “sal”, and “temp” in the quantity column stand for “radiation”, “salinity”, and “temperature” respectively.)

measurements <- read_csv("https://education.rstudio.com/blog/2020/08/more-example-exams/measurements.csv")
## Parsed with column specification:
## cols(
##   visit_id = col_double(),
##   visitor = col_character(),
##   quantity = col_character(),
##   reading = col_double()
## )

Question 2

Create a tibble containing only rows where none of the values are NA and save in a tibble called cleaned.

cleaned <-
measurements %>%
  filter(!is.na(visitor), !is.na(quantity), !is.na(reading))

# other option: use na.omit(measurements)

Question 3

Count the number of measurements of each type of quantity in cleaned. Your result should have one row for each quantity "rad", "sal", and "temp".

cleaned %>%
  group_by(quantity) %>%
  summarize(n())
## `summarise()` ungrouping output (override with `.groups` argument)
ABCDEFGHIJ0123456789
quantity
<chr>
n()
<int>
rad8
sal7
temp3
# other option: use count()

Question 4

Display the minimum and maximum value of reading separately for each quantity in cleaned. Your result should have one row for each quantity "rad", "sal", and "temp".

cleaned %>%
  group_by(quantity) %>%
  summarize(min(reading), max(reading))
## `summarise()` ungrouping output (override with `.groups` argument)
ABCDEFGHIJ0123456789
quantity
<chr>
min(reading)
<dbl>
max(reading)
<dbl>
rad1.4611.25
sal0.0541.60
temp-21.50-16.00

Question 5

Create a tibble in which all salinity ("sal") readings greater than 1 are divided by 100. (This is needed because some people wrote percentages as numbers from 0.0 to 1.0, but others wrote them as 0.0 to 100.0.)

measurements %>%
  filter(quantity == "sal") %>%
  mutate(new_reading = ifelse(reading > 1, reading/100, reading))
ABCDEFGHIJ0123456789
visit_id
<dbl>
visitor
<chr>
quantity
<chr>
reading
<dbl>
new_reading
<dbl>
619dyersal0.130.130
622dyersal0.090.090
734lakesal0.050.050
735NAsal0.060.060
751lakesalNANA
752lakesal0.090.090
752roesal41.600.416
837lakesal0.210.210
837roesal22.500.225
measurements %>%
  filter(quantity == "sal") %>%
  mutate(reading = reading/100)
ABCDEFGHIJ0123456789
visit_id
<dbl>
visitor
<chr>
quantity
<chr>
reading
<dbl>
619dyersal0.0013
622dyersal0.0009
734lakesal0.0005
735NAsal0.0006
751lakesalNA
752lakesal0.0009
752roesal0.4160
837lakesal0.0021
837roesal0.2250

Combining data

more-sample-exams/#combining-data

Question 1

Read visited.csv and drop rows containing any NAs, assigning the result to a new tibble called visited.

visited <-
  read_csv("https://education.rstudio.com/blog/2020/08/more-example-exams/visited.csv") %>%
  filter(!is.na(site_id), !is.na(visit_date))
## Parsed with column specification:
## cols(
##   visit_id = col_double(),
##   site_id = col_character(),
##   visit_date = col_date(format = "")
## )

Question 2

Use an inner join to combine visited with cleaned using the visit_id column for matches.

inner_join(visited, cleaned, by = "visit_id")
ABCDEFGHIJ0123456789
visit_id
<dbl>
site_id
<chr>
visit_date
<date>
visitor
<chr>
quantity
<chr>
reading
<dbl>
619DR-11927-02-08dyerrad9.82
619DR-11927-02-08dyersal0.13
622DR-11927-02-10dyerrad7.80
622DR-11927-02-10dyersal0.09
734DR-31930-01-07pbrad8.41
734DR-31930-01-07lakesal0.05
734DR-31930-01-07pbtemp-21.50
735DR-31930-01-12pbrad7.22
751DR-31930-02-26pbrad4.35
751DR-31930-02-26pbtemp-18.50

Question 3

Find the highest radiation ("rad") reading at each site. (Sites are identified by values in the site_id column.)

inner_join(visited, cleaned, by = "visit_id") %>%
  group_by(site_id) %>%
  summarize(max(reading))
## `summarise()` ungrouping output (override with `.groups` argument)
ABCDEFGHIJ0123456789
site_id
<chr>
max(reading)
<dbl>
DR-111.25
DR-38.41
MSK-422.50

Question 4

Find the date of the highest radiation reading at each site.

inner_join(visited, cleaned, by = "visit_id") %>%
  group_by(site_id) %>%
  filter(reading == max(reading))
ABCDEFGHIJ0123456789
visit_id
<dbl>
site_id
<chr>
visit_date
<date>
visitor
<chr>
quantity
<chr>
reading
<dbl>
734DR-31930-01-07pbrad8.41
837MSK-41932-01-14roesal22.50
844DR-11932-03-22roerad11.25

Plotting

more-example-exams/#plotting

Question 1

The code below is supposed to read the file home-range-database.csv to create a tibble called hra_raw, but contains a bug. Describe and fix the problem. (There are several ways to fix it: please use whichever you prefer.)

hra_raw <- read_csv(here::here("data", "home-range-database.csv"))

From looking at the documentation, the here::here() function is to be considered a replacement for “filepath” within a local directory. There is no “data” or “home-range-database.csv” in my local directory, so here() can’t find it. I might fix this by moving home-range-database.csv into the data folder in my directory. Below I use the url provided for the csv.

hra_raw <- read_csv("https://education.rstudio.com/blog/2020/08/more-example-exams/home-range-database.csv")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   mean.mass.g = col_double(),
##   log10.mass = col_double(),
##   mean.hra.m2 = col_double(),
##   log10.hra = col_double(),
##   preymass = col_double(),
##   log10.preymass = col_double(),
##   PPMR = col_double()
## )
## See spec(...) for full column specifications.

Question 2

Convert the class column (which is text) to create a factor column class_fct and assign the result to a tibble hra. Use forcats to order the factor levels as:

  • mammalia
  • reptilia
  • aves
  • actinopterygii
hra <-
hra_raw %>%
  mutate(class_fct = factor(class,
                            levels = c("mammalia", "reptilia", "aves", "actinopterygii")))

Question 3

Create a scatterplot showing the relationship between log10.mass and log10.hra in hra.

ggplot(hra, aes(x = log10.mass, y = log10.hra)) +
  geom_point()

Question 4

Colorize the points in the scatterplot by class_fct.

ggplot(hra, aes(x = log10.mass, y = log10.hra)) +
  geom_point(aes(color = class_fct))

Question 5

Display a scatterplot showing only data for birds (class aves) and fit a linear regression to that data using the lm function.

hra %>% 
  filter(class == "aves") %>%
  ggplot(aes(x = log10.mass, y = log10.hra)) +
  geom_point(aes(color = class_fct)) +
  geom_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'

Functional programming

more-sample-exams/#functional-programming

Question 1

Write a function called summarize_table that takes a title string and a tibble as input and returns a string that says something like, “title has # rows and # columns”. For example, summarize_table('our table', person) should return the string "our table has 5 rows and 3 columns".

summarize_table <- function(title, tibble) {
  num_rows <- nrow(tibble)
  num_cols <- ncol(tibble)
  result <- str_c(title,"has", num_rows, 
                  "rows and", num_cols, "columns", sep = " ")
  print(result)
}
summarize_table("HRA dataset", hra)
## [1] "HRA dataset has 566 rows and 25 columns"

Question 2

Write another function called show_columns that takes a string and a tibble as input and returns a string that says something like, “table has columns name, name, name”. For example, show_columns('person', person) should return the string "person has columns person_id, personal_name, family_name".

show_columns <- function(title, tibble) {
  col_names <- names(tibble)
  col_names_collapsed <- str_c(col_names, collapse = ", ")
  result <- str_c(title, "has columns", 
                  col_names_collapsed, sep = " ")
  print(result)  
}
show_columns("HRA", hra)
## [1] "HRA has columns taxon, common.name, class, order, family, genus, species, primarymethod, N, mean.mass.g, log10.mass, alternative.mass.reference, mean.hra.m2, log10.hra, hra.reference, realm, thermoregulation, locomotion, trophic.guild, dimension, preymass, log10.preymass, PPMR, prey.size.reference, class_fct"

Question 3

The function rows_from_file returns the first N rows from a table in a CSV file given the file’s name and the number of rows desired. Modify it so that if no value is specified for the number of rows, a default of 3 is used.

# https://www.r-bloggers.com/2015/08/function-argument-lists-and-missing/
# if the argument is optional
  
rows_from_file <- function(filename, num_rows = NULL){
  name <- readr::read_csv(filename)

    if (is.null(num_rows)){
      head(name, 3)
    } else {
      head(name, n = num_rows)  
    }
    #ifelse(num_rows != NA, head(n = num_rows), head(3))
}

# should show 3 rows
rows_from_file("https://education.rstudio.com/blog/2020/08/more-example-exams/measurements.csv")
## Parsed with column specification:
## cols(
##   visit_id = col_double(),
##   visitor = col_character(),
##   quantity = col_character(),
##   reading = col_double()
## )
ABCDEFGHIJ0123456789
visit_id
<dbl>
visitor
<chr>
quantity
<chr>
reading
<dbl>
619dyerrad9.82
619dyersal0.13
622dyerrad7.80

Question 4

The function long_name checks whether a string is longer than 4 characters. Use this function and a function from purrr to create a logical vector that contains the value TRUE where family names in the tibble person are longer than 4 characters, and FALSE where they are 4 characters or less.

    long_name <- function(name) {
      stringr::str_length(name) > 4
    }

person$family_name %>% map_lgl(long_name)
## [1] FALSE  TRUE FALSE  TRUE  TRUE

Wrapping up

more-sample-exams/#wrapping-up

Modify the YAML header of this file so that a table of contents is automatically created each time this document is knit, and fix any errors that are preventing the document from knitting cleanly.

---
title: "Tidyverse Exam Version 2.0"
output:
html_document:
    theme: flatly
---
---
title: "Tidyverse Exam Version 2.0"
output:
  html_document: # this was indented
    theme: flatly
    toc: true    # this was added
---