Mastitis occurs when bacteria gets into a cows udder which causes an infection. On average mastitis costs farmers €60 per cow per year. Mastitis accounts for a loss of 20% of the total agricultural revenue in Ireland. Healthy udders are: economically profitable, lead to a better quality product, and better cow welfare. Somatic cell count (scc) is the total number of cells per millilitre in milk. Primarily, scc is composed of leukocytes, or white blood cells, that are produced by the cow’s immune system to fight a mastitis infection. Since leukocytes in the udder increase as the condition worsens, scc provides an indication of the degree of mastitis in an individual cow.
Automated (robotic) milking systems are becoming more popular in Ireland and provide information on various measures of the composition of milk. Casein and whey protein are the major proteins in milk. Casein constitutes approximately 80% (29.5 g/L) of the total protein in bovine milk, and whey protein accounts for about 20% (6.3 g/L). The objective of this project is to analyze the relationship between the somatic cell count scc with the protein levels recorded by the automated (robotic) milking system, which are protein and casein. We also consider the percentage concentrate feed (supplements) in the cows’ diet
conc_fed.
This data set contains
• protein the recorded protein in the milk for cow i,
• casein the casein in the milk for cow i,
• scc the somatic cell count in the milk for cow i,
• conc_fed the percenatge concentrate feed
(supplements) in the cows diet for cow i,
for i = 1, . . . ,N, where N is the number of cows
recorded in the data set. The observations relate to
individual cows on four farms in Ireland.
Exploratory
Data Analysis (35 marks):
For each question in the EDA section please provide
the lines of R code required to produce your results
and the tables and figures produced by R.
1. Using a boxplot, histogram and the descriptive
statistics (mean, min, max, median, and quantiles).
Describe the distribution of the somatic cell count
scc. (5 marks)
2. Using a boxplot, histogram and the descriptive
statistics (mean, min, max, median, and quantiles).
Describe the distribution of the log of the somatic
cell counts scc. (5 marks)
3. Using a boxplot, histogram and the descriptive
statistics (mean, min, max, median, and quantiles).
Describe the distribution of the protein levels
protein. (5 marks)
4. Using a boxplot, histogram and the descriptive
statistics (mean, min, max, median, and quantiles).
Describe the distribution of the casein levels casein.
(5 marks)
5. Convert the categorical variable conc_fed to a
factor. Describe and illustrate the frequency and
proportions of the categorical variable concentrate
feed conc_fed (5 marks)
6. Using the descriptive statistics (mean, standard
deviation, median, mad: median absolute deviation
(from the median), minimum, maximum, skew and standard
error) and a boxplot describe how the log
of somatic cell counts scc varies with respect to the
variable concentrate feed conc_fed (5 marks)
7. Using the correlation and scatter plots discuss the
relationship between log(scc) and each of the
variables protein and casein. (3 marks)
8. Based on the results from Q 7, which variable
protein or casein would provide a better predictor
variable in your regression model with log(scc) as the
response. Provide a justification for your
selection. (2 marks)
Regression Model (65 marks):
1. Using R fit a simple linear regression model to the
data with log(scc) as the response variable and the
variable chosen in Q8 of the exploratory analysis
section as the predictor variable. Define and describe
the mathematical equation for the model. (Also provide
you R code) (4 marks)
2. Interpret the estimate of the intercept term. (2
marks)
3. Interpret the estimate of the slope term. (2 marks)
4. Calculate the variance of the estimate of the
intercept and slope term. (2 marks)
5. Calculate and interpret the confidence intervals
for _0 (Provide you R
code) (5 marks)
6. Calculate and interpret the confidence intervals
for _1 (Provide you R
code) (5 marks)
7. Compute and interpret the hypothesis test H0 : _0 = 0 vs Ha : _0 6= 0. State the
test statistic.
Compare the test statistic to the correct distribution
value and state your conclusion. Also, report the
p-value and the conclusion in the context of the
problem. (8 marks)
8. Compute and interpret the hypothesis test H0 : _1 = 0 vs Ha : _1 6= 0. State the
test statistic.
Compare the test statistic to the correct distribution
value and state your conclusion. Also, report the
p-value and the conclusion in the context of the
problem. (8 marks)
9. Interpret the F-statistic in the output in the
summary of the regression model. Hint: State the
hypothesis being tested, the test statistic and
p-value and the conclusion in the context of the problem.
(6 marks).
10. Interpret the R-squared value. (2 marks)
11. Interpret the residual standard error of the
simple linear regression model. (2 marks)
12. Calculate, plot and comment on the shape of the
confidence intervals for the estimated values of Y
(Provide you R code) (4 marks)
13. List the assumptions of the linear regression
model required for small sample inference (5 marks)
14. Examine the residuals of the regression model and
comment on whether you think the residuals satisfy
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
30 | 31 | 1 | 2 | 3 | 4 | 5 |
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 1 | 2 | 3 |
Get Free Quote!
354 Experts Online