---
title: "SML201 Precept 8 Solutions, Spring 2020"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
### Problem 1
Suppose that $X\sim\mathcal{N}(2, 10^2)$. We sample the variable $X$ once (i.e., we obtain a sample from the distribution $\mathcal{N}(2, 10^2)$).
### Problem 1(a)
Write R code to obtain $P(2.1 < X < 3.1)$. Use `pnorm`.
#### Solution
```{r}
pnorm(q = 3.1, mean = 2, sd = 10) - pnorm(q = 2.1, mean = 2, sd = 10)
```
*Learning goal*: compute probabilities of intervals
### Problem 1(b)
Write R code to obtain $P(2.1 < X < 3.1)$. Use `pnorm(..., ,mean = 0, sd = 1)`.
#### Solution
The idea here is that we can "shift" and "shrink" X using $(X-2)/10$ so that now
$(X-2)/10 \sim \mathcal{N}(0, 1)$
```{r}
pnorm(q = (3.1-2)/10, mean = 0, sd = 1) - pnorm(q = (2.1-2)/10, mean = 0, sd = 1)
```
*Learning goal*: transform normal random variables to be $\mathcal{N}(0, 1)$
### Problem 1(c)
Write R code to obtain $P(2.1 < X < 3.1)$. Use `rnorm`. (And not `pnorm`.)
#### Solution
```{r}
x <- rnorm(n = 100000, mean = 2, sd = 10)
mean((2.1 < x) & (x < 3.1))
```
*Learning goal*: compute probabilities via simulation. Understand the connection between samples from a distribution and the cumulative probability function.
### Problem 1(d)
Write R code to obtain $P(2.1 < X < 3.1)$. Use `rnorm(..., mean = 0, sd = 1)`
### Problem 2
Suppose 65% of Princeton students like Wawa better than World Coffee. We selected a random sample of 100 students, and asked them which they prefer. What is the probability that more than 78 students said "Wawa"?
#### Problem 2(a)
Answer the question using `pbinom`.
*Learning goal*: map a word problem to a cumulative probability computation, use the normal approximation to the binomial distribution.
#### Solution
```{r}
1 - pbinom(q = 78, size = 100, prob = .65)
```
Another option is to use the `lower.tail` argument, but that is not preferred right now
```{r}
pbinom(q = 78, size = 100, prob = .65, lower.tail = F)
```
*Learning goal*: map a word problem to a cumulative probability computation.
#### Problem 2(b)
Answer the question using `pnorm`. Use the normal approximation to the Binomial distribution (recall: the mean is $n\times prob$ and the variance is $n\times prob\times (1-prob)$).
#### Solution
```{r}
1 - pnorm(q = 78, mean = 65, sd = sqrt(.65*.35*100))
```
(Note: we are not requiring trying to use a continuity correction. To match the answer to 2(b), we'd need `q = 78.9`)
Another option (dispreferred):
```{r}
pnorm(q = 78, mean = 65, sd = sqrt(.65*.35*100), lower.tail = F)
```
*Learning goal*: use the normal approximation to the binomial. Recognize the consequences of not using continuity correction.
### Problem 3
Suppose 100 Princeton students we asked whether Harvard or Stanford is the worse online institution of higher learning. 60 students said that Stanford is worse. Compute the p-value for the null hypothesis that Princeton students think that Harvard and Stanford are equally bad, on average. What can you conclude?
#### Solution
The null hypothesis here is that $P(Stanford) = 0.5$
The p-value here is P(n.Stanford >= 60 or n.Stanford <= 40). We can compute that using
```{r}
pbinom(q = 40, size = 100, prob = 0.5) + (1 - pbinom(q = 59, size = 100, prob = 0.5))
```
We would see a value that's as extreme as what we're seeing 5.6% of the time. This suggests that the data we have is consistent with Princeton students thinking that Harvard and Stanford are equally bad online institutions of higher learning.
### Problem 4
Answer Problem 2 using only `rnorm(..., mean = 0, sd = 1)`
#### Solution
```{r}
x <- rnorm(n = 100000, mean = 0, sd = 1)
# Now, 65 + x*sqrt(.65*.35*100) ~ N(65, sqrt(.65*.35*100)^2)
y <- 65 + x*sqrt(.65*.35*100)
mean(y > 78)
```
*Learning goal*: compute probability via simulation; flexibly apply variable transformation