"Children's reading scores increase with their shoe size." This is most likely:
Solution:
Both variables increase with age — the lurking variable is age. Common cause.
Part B: Thinking [15 marks]
Question 8 [5 marks]
For the data \( x: 2, 4, 6, 8, 10 \) and \( y: 3, 6, 8, 11, 12 \):
a) Compute \( \bar{x}, \bar{y} \), \( s_x, s_y \).
b) Compute \( r \) (Pearson correlation).
c) State the regression line.
d) Predict \( y \) when \( x = 7 \).
0 words
Question 9 [5 marks]
A regression model relates a student's hours of study to grade: \( \hat{G} = 50 + 4h \) (where \( h \) = hours/week, \( G \) = grade %). a) Predict the grade for 7 hours/week. b) Interpret the slope. c) Interpret the y-intercept. d) Is extrapolation to 25 hours/week reasonable? Justify.
0 words
Question 10 [5 marks]
For each, identify the type of relationship (cause-and-effect, common cause, reverse cause-and-effect, accidental, presumed) and justify:
a) Smoking and lung cancer.
b) Number of churches in a city and crime rate.
c) Sales of cell phones and number of vending machines.
d) GDP per capita and life expectancy.
0 words
Part C: Communication [15 marks]
Question 11 [4 marks]
Explain the difference between correlation and causation. Use a concrete example to demonstrate why a correlation does NOT prove causation.
0 words
Question 12 [4 marks]
A student says: "If \( r = 0.4 \), then 40% of the variation is explained." Is this correct? Explain the role of \( r^2 \) and what it actually represents.
0 words
Question 13 [4 marks]
Describe a step-by-step procedure for a two-variable data analysis: from data collection through to interpretation. Include: (a) selecting variables, (b) plotting, (c) computing \( r \), (d) determining the regression line, (e) interpreting in context, (f) discussing causation.
0 words
Question 14 [3 marks]
When is interpolation safe but extrapolation NOT safe? Use a real example to illustrate.
0 words
Part D: Application [15 marks]
Question 15 [5 marks]
Five students recorded their hours of weekly exercise (\( x \)) and resting heart rate (\( y \), bpm):
\( (3, 80), (5, 75), (7, 70), (10, 65), (12, 60) \).
a) Plot the data.
b) Compute \( r \) and interpret.
c) Find the regression line.
d) Predict the resting heart rate of a student who exercises 8 hours/week.
e) Comment on the validity of the prediction.
0 words
Question 16 [5 marks]
A real-estate dataset contains house prices (\( y \), thousands of $) vs square footage (\( x \), thousands of sq ft). The regression is \( \hat{y} = 50 + 150x \) and \( r = 0.85 \). a) Predict the price of a 2200 sq ft house. b) State \( r^2 \) and interpret. c) The model predicts a 5000 sq ft mansion costs $800k. Critique this prediction.
0 words
Question 17 [5 marks]
A dataset on smoking (\( x \) = packs/day) vs life expectancy (\( y \) = years) gives \( r = -0.78 \) for 200 adults. a) Interpret \( r \). b) Compute \( r^2 \) and explain. c) Does this prove smoking causes shorter life? Discuss confounding factors and the difference between an observational study and a controlled experiment.
0 words
Evaluation Rubric
Level
Description
%
4
Thorough, insightful, high degree of effectiveness