Create the file p2.R. You should write your solutions in that file.

Some of you will be tempted to use for and while-loops to solve some of the problems below (if you’ve used those before). Please don’t do this – the goal here is to try to use R the way professional data scientists use it, which usually means no loops.

Problem 1: Vectors and Variables

Define the vector c(42, 43, 45, 49, 501), and store it in a variable called my.vec.

Write code to extract the second and fourth element of the vector. Explain the difference between my.vec and "my.vec"

Problem 2: Vectors

Problem 2(a)

Write a function with the signature elems.below <- function(vec, upper.bound) which takes in a vector of numerics vec, and returns a vector that contains the elements of vec that are smaller than upper.bound

Problem 2(b)

Write a function with the signature elem.just.below <- function(vec, upper.bound) which takes in a vector of numerics vec, and returns the largest element of vec that is smaller than upper.bound.

Problem 2(c)

Write a function with the signature my.median <- function(vec) which returns the median of the vector vec. Hint: the median is the number at the center of the sorted version of vec. You can assume that the vector vec is of length 5. Test your function. You may not use R’s built-in function median. The expression n %% 2 computes the remainder of the division of n by two, so n %% 2 is equal to 0 when an integer n is even and positive, and is equal to one if it is odd and positive.

Problem 2(d)

Now, rewrite my.median so that vec can be of arbitrary size. You can look up the rule for handling even-sized vectors in wikipedia.

Problem 3: Data Frames

Run the following once in the console:


In your p2.R, include library(gapminder)

Look at the data frame gapminder, and figure out what each column contains. You should be able to explain this to your preceptor. You can run ?gapminder in the console to read the accompanying description to the dataset.

Problem 3(a)

Write a function that computes how many countries in the dataset there are on a given continent. Test this function by querying it with different continent names.

Problem 3(b)

Write a function that takes in a data frame like gapminder, and returns the country with the largest life expectancy on a given continent between the years y1 and y2.

Problem 3(c)

Test the function from 3(b) by creating your own gapminder-like data frame

Problem 3(d)

Write a function that computes the world population in a given year. Test this function. Note: with some versions of R, you will need to modify gapminder$pop using gapminder$pop <- as.numeric(gapminder$pop) first. (Explanation (optional advanced material): this has to do with the fact that if pop is an integer rather than a numeric, the sum of the components of is also an integer, and integers are sometimes limited in size. To see the maximum possible integer on your computer, you can run .Machine$integer.

Problem 4 (Challenge)

Make a new data frame which contains the increase in life expectancy per year for each country in gapminder. The increase per year is the difference between the life expectancy in the last year and the first year, divided by the number of years.