06 May, 2022
\[response = intercept + Predictor1 + Predictor2\] \[y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+...+\beta_jx_{ij}+\epsilon_i\] OR \[y_i=\beta_0+\sum^N_{j=1:n}{\beta_j x_{ji}}+\epsilon_i\]
\[growth = intercept + Predictor1 + Predictor2\]
\[y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+...+\beta_jx_{ij}+\epsilon_i\]
- effect of one predictor holding the other(s) constant
\(growth = intercept + Predictor1 + Predictor2\)
\(y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+...+\beta_jx_{ij}+\epsilon_i\)
Y | X1 | X2 |
---|---|---|
3.0 | 22.7 | 0.9 |
2.5 | 23.7 | 0.5 |
6.0 | 25.7 | 0.6 |
5.5 | 29.1 | 0.7 |
9.0 | 22.0 | 0.8 |
8.6 | 29.0 | 1.3 |
12.0 | 29.4 | 1.0 |
\[ \begin{alignat*}{1} 3~&=~(1\times\beta_0) ~&+~(\beta_1\times22.7) ~&+~(\beta_2\times0.9)~&+~\varepsilon_1\\ 2.5~&=~(1\times\beta_0) ~&+~(\beta_1\times23.7) ~&+~(\beta_2\times0.5)~&+~\varepsilon_2\\ 6~&=~(1\times\beta_0) ~&+~(\beta_1\times25.7) ~&+~(\beta_2\times0.6)~&+~\varepsilon_3\\ 5.5~&=~(1\times\beta_0) ~&+~(\beta_1\times29.1) ~&+~(\beta_2\times0.7)~&+~\varepsilon_4\\ 9~&=~(1\times\beta_0) ~&+~(\beta_1\times22) ~&+~(\beta_2\times0.8)~&+~\varepsilon_5\\ 8.6~&=~(1\times\beta_0) ~&+~(\beta_1\times29) ~&+~(\beta_2\times1.3)~&+~\varepsilon_6\\ 12~&=~(1\times\beta_0) ~&+~(\beta_1\times29.4) ~&+~(\beta_2\times1)~&+~\varepsilon_7\\ \end{alignat*} \]
\[growth = intercept + Pred1 + Pred2 + Pred1\times Pred2\]
\[y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i1}x_{i2}+...+\epsilon_i\]
\[ \begin{alignat*}{1} 3~&=~(1\times\beta_0) ~&+~(\beta_1\times22.7) ~&+~(\beta_2\times0.9)~&+~(\beta_3\times22.7\times0.9)~&+~\varepsilon_1\\ 2.5~&=~(1\times\beta_0) ~&+~(\beta_1\times23.7) ~&+~(\beta_2\times0.5)~&+~(\beta_3\times23.7\times0.5)~&+~\varepsilon_2\\ 6~&=~(1\times\beta_0) ~&+~(\beta_1\times25.7) ~&+~(\beta_2\times0.6)~&+~(\beta_3\times25.7\times0.6)~&+~\varepsilon_3\\ 5.5~&=~(1\times\beta_0) ~&+~(\beta_1\times29.1) ~&+~(\beta_2\times0.7)~&+~(\beta_3\times29.1\times0.7)~&+~\varepsilon_4\\ 9~&=~(1\times\beta_0) ~&+~(\beta_1\times22) ~&+~(\beta_2\times0.8)~&+~(\beta_3\times22\times0.8)~&+~\varepsilon_5\\ 8.6~&=~(1\times\beta_0) ~&+~(\beta_1\times29) ~&+~(\beta_2\times1.3)~&+~(\beta_3\times29\times1.3)~&+~\varepsilon_6\\ 12~&=~(1\times\beta_0) ~&+~(\beta_1\times29.4) ~&+~(\beta_2\times1)~&+~(\beta_3\times29.4\times1)~&+~\varepsilon_7\\ \end{alignat*} \]
Normality, homog., linearity
(multi)collinearity
Strength of a relationship \[ R^2 \] Strong when \(R^2 \ge 0.8\)
\[ var.inf = \frac{1}{1-R^2} \] Collinear when \(var.inf >= 5\)
Some prefer \(>3\)
(multi)collinearity
library(car) # additive model - scaled predictors vif(lm(y ~ scale(x1) + scale(x2), data))
## scale(x1) scale(x2) ## 1.743817 1.743817
(multi)collinearity
library(car) # additive model - scaled predictors vif(lm(y ~ scale(x1) + scale(x2), data))
## scale(x1) scale(x2) ## 1.743817 1.743817
# multiplicative model - raw predictors vif(lm(y ~ x1 * x2, data))
## x1 x2 x1:x2 ## 7.259729 5.913254 16.949468
# multiplicative model - raw predictors vif(lm(y ~ x1 * x2, data))
## x1 x2 x1:x2 ## 7.259729 5.913254 16.949468
# multiplicative model - scaled predictors vif(lm(y ~ scale(x1) * scale(x2), data))
## scale(x1) scale(x2) scale(x1):scale(x2) ## 1.769411 1.771994 1.018694
Additive model
\(y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\epsilon_i\)
data.add.lm <- lm(y~scale(x1)+scale(x2), data)
Additive model
\(y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\epsilon_i\)
data.add.lm <- lm(y~scale(x1)+scale(x2), data)
Multiplicative model
\(y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i1}x_{i2}+\epsilon_i\)
data.mult.lm <- lm(y~scale(x1)+scale(x2)+scale(cx1):scale(x2), data) #OR data.mult.lm <- lm(y~scale(x1)*scale(x2), data)
Additive model
plot(data.add.lm)
Multiplicative model
plot(data.mult.lm)
Additive model
summary(data.add.lm)
## ## Call: ## lm(formula = y ~ scale(x1) + scale(x2), data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.39418 -0.75888 -0.02463 0.73688 2.37938 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -1.5161 0.1055 -14.364 < 2e-16 *** ## scale(x1) 0.7702 0.1401 5.499 3.1e-07 *** ## scale(x2) -1.5182 0.1401 -10.839 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.055 on 97 degrees of freedom ## Multiple R-squared: 0.5567, Adjusted R-squared: 0.5476 ## F-statistic: 60.91 on 2 and 97 DF, p-value: < 2.2e-16
Additive model
confint(data.add.lm)
## 2.5 % 97.5 % ## (Intercept) -1.7255292 -1.306576 ## scale(x1) 0.4922116 1.048241 ## scale(x2) -1.7962511 -1.240222
Multiplicative model
summary(data.mult.lm)
## ## Call: ## lm(formula = y ~ scale(x1) * scale(x2), data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.23364 -0.62188 0.01763 0.80912 1.98568 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -1.6995 0.1228 -13.836 < 2e-16 *** ## scale(x1) 0.8146 0.1367 5.957 4.22e-08 *** ## scale(x2) -1.5648 0.1368 -11.435 < 2e-16 *** ## scale(x1):scale(x2) 0.2837 0.1052 2.697 0.00826 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.023 on 96 degrees of freedom ## Multiple R-squared: 0.588, Adjusted R-squared: 0.5751 ## F-statistic: 45.66 on 3 and 96 DF, p-value: < 2.2e-16
Additive model
library(effects) plot(effect("x1",data.add.lm, residuals=TRUE))
Additive model
library(effects) plot(allEffects(data.add.lm, residuals=TRUE))
Multiplicative model
library(effects) plot(allEffects(data.mult.lm, residuals=TRUE))
“All models are wrong, but some are useful” George E. P. Box
AIC(data.add.lm, data.mult.lm)
## df AIC ## data.add.lm 4 299.5340 ## data.mult.lm 5 294.2283
library(MuMIn) AICc(data.add.lm, data.mult.lm)
## df AICc ## data.add.lm 4 299.9551 ## data.mult.lm 5 294.8666
library(MuMIn) data.mult.lm <- lm(y~scale(x1)*scale(x2), data, na.action=na.fail) dredge(data.mult.lm, rank="AICc", trace=TRUE)
## 0 : lm(formula = y ~ 1, data = data, na.action = na.fail) ## 1 : lm(formula = y ~ scale(x1) + 1, data = data, na.action = na.fail) ## 2 : lm(formula = y ~ scale(x2) + 1, data = data, na.action = na.fail) ## 3 : lm(formula = y ~ scale(x1) + scale(x2) + 1, data = data, na.action = na.fail) ## 7 : lm(formula = y ~ scale(x1) + scale(x2) + scale(x1):scale(x2) + ## 1, data = data, na.action = na.fail)
## Global model call: lm(formula = y ~ scale(x1) * scale(x2), data = data, na.action = na.fail) ## --- ## Model selection table ## (Int) scl(x1) scl(x2) scl(x1):scl(x2) df logLik AICc delta weight ## 8 -1.699 0.8146 -1.565 0.2837 5 -142.114 294.9 0.00 0.927 ## 4 -1.516 0.7702 -1.518 4 -145.767 300.0 5.09 0.073 ## 3 -1.516 -1.015 3 -159.333 324.9 30.05 0.000 ## 1 -1.516 2 -186.446 377.0 82.15 0.000 ## 2 -1.516 -0.2213 3 -185.441 377.1 82.27 0.000 ## Models ranked by AICc(x)
library(MuMIn) data.dredge<-dredge(data.mult.lm, rank="AICc") model.avg(data.dredge, subset=delta<20)
## ## Call: ## model.avg(object = data.dredge, subset = delta < 20) ## ## Component models: ## '123' '12' ## ## Coefficients: ## (Intercept) scale(x1) scale(x2) scale(x1):scale(x2) ## full -1.686125 0.8113591 -1.561395 0.2630362 ## subset -1.686125 0.8113591 -1.561395 0.2836934
Or more preferable:
## ## ## Y A dummy1 dummy2 dummy3 ## ---- ---- -------- -------- -------- ## 2 G1 1 0 0 ## 3 G1 1 0 0 ## 4 G1 1 0 0 ## 6 G2 0 1 0 ## 7 G2 0 1 0 ## 8 G2 0 1 0 ## 10 G3 0 0 1 ## 11 G3 0 0 1 ## 12 G3 0 0 1
\[y_{ij}=\mu+\beta_1(dummy_1)_{ij} + \beta_2(dummy_2)_{ij} +\beta_3(dummy_3)_{ij} + \varepsilon_{ij}\]
\[y_{ij}=\color{darkorange}{\mu}+\color{darkorange}{\beta_1}(dummy_1)_{ij} + \color{darkorange}{\beta_2}(dummy_2)_{ij} +\color{darkorange}{\beta_3}(dummy_3)_{ij} + \varepsilon_{ij}\]
## ## ## Y A Intercept dummy1 dummy2 dummy3 ## ---- ---- ----------- -------- -------- -------- ## 2 G1 1 1 0 0 ## 3 G1 1 1 0 0 ## 4 G1 1 1 0 0 ## 6 G2 1 0 1 0 ## 7 G2 1 0 1 0 ## 8 G2 1 0 1 0 ## 10 G3 1 0 0 1 ## 11 G3 1 0 0 1 ## 12 G3 1 0 0 1
\[y_{ij}=\color{darkorange}{\mu}+\color{darkorange}{\beta_1}(dummy_1)_{ij} + \color{darkorange}{\beta_2}(dummy_2)_{ij} +\color{darkorange}{\beta_3}(dummy_3)_{ij} + \varepsilon_{ij}\]
\[y_{i}=\mu+\beta_1(dummy_1)_{i} + \beta_2(dummy_2)_{} +\beta_3(dummy_3)_{i} + \varepsilon_{i}\]
\[y_{i}=\beta_1(dummy_1)_{i} + \beta_2(dummy_2)_{i} +\beta_3(dummy_3)_{i} + \varepsilon_{ij}\] \[y_{ij}=\alpha_i + \varepsilon_{ij} \quad \mathsf{i=p}\]
\[y_{i}=\beta_1(dummy_1)_{i} + \beta_2(dummy_2)_{i} +\beta_3(dummy_3)_{i} + \varepsilon_{i}\]
## ## ## Y A dummy1 dummy2 dummy3 ## ---- ---- -------- -------- -------- ## 2 G1 1 0 0 ## 3 G1 1 0 0 ## 4 G1 1 0 0 ## 6 G2 0 1 0 ## 7 G2 0 1 0 ## 8 G2 0 1 0 ## 10 G3 0 0 1 ## 11 G3 0 0 1 ## 12 G3 0 0 1
Raw data | Matrix representation | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
\[ y_i = \alpha_1 D_{1i} + \alpha_2 D_{2i} + \alpha_3 D_{3i} + \varepsilon_i\\ y_i = \alpha_p+\varepsilon_i, \\ \text{where}~p= \text{number of levels of the factor}\\\text{and } D = \text{dummy variables}\] \[\begin{pmatrix} 2\\3\\4\\6\\7\\8\\10\\11\\12 \end{pmatrix} = \begin{pmatrix} 1&0&0\\1&0&0\\1&0&0\\0&1&0\\0&1&0\\0&1&0\\0&0&1\\0&0&1\\0&0&1 \end{pmatrix} \times \begin{pmatrix} \alpha_1\\\alpha_2\\\alpha_3 \end{pmatrix} + \begin{pmatrix} \varepsilon_1\\\varepsilon_2\\\varepsilon_3\\\varepsilon_4&\\\varepsilon_5\\\varepsilon_6\\\varepsilon_7&\\\varepsilon_8\\\varepsilon_9 \end{pmatrix} \] |
\[\alpha_1 = 3, \alpha_2 = 7, \alpha_3 = 11\]
Parameter | Estimates | Null hypothesis |
---|---|---|
\(\alpha^*_1\) | mean of group 1 | H\(_0\): \(\alpha_1=\alpha_1=0\) |
\(\alpha^*_2\) | mean of group 2 | H\(_0\): \(\alpha_2=\alpha_2=0\) |
\(\alpha^*_3\) | mean of group 3 | H\(_0\): \(\alpha_3=\alpha_3=0\) |
… |
summary(lm(Y~-1+A))$coef
## Estimate Std. Error t value Pr(>|t|) ## AG1 3 0.5773503 5.196152 2.022368e-03 ## AG2 7 0.5773503 12.124356 1.913030e-05 ## AG3 11 0.5773503 19.052559 1.351732e-06
\[y_{i}=\mu+\beta_1(dummy_1)_{i} + \beta_2(dummy_2)_{i} +\beta_3(dummy_3)_{i} + \varepsilon_{i}\]
\[y_{ij}=\mu+\beta_2(dummy_2)_{ij} + \beta_3(dummy_3)_{ij} + \varepsilon_{ij}\] \[y_{ij}=\mu + \alpha_i + \varepsilon_{ij} \qquad \mathsf{i=p-1}\]
\[y_{i}=\alpha + \beta_2(dummy_2)_{i} +\beta_3(dummy_3)_{i} + \varepsilon_{i}\]
## ## ## Y A alpha dummy2 dummy3 ## ---- ---- ------- -------- -------- ## 2 G1 1 0 0 ## 3 G1 1 0 0 ## 4 G1 1 0 0 ## 6 G2 1 1 0 ## 7 G2 1 1 0 ## 8 G2 1 1 0 ## 10 G3 1 0 1 ## 11 G3 1 0 1 ## 12 G3 1 0 1
Raw data | Matrix representation | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
\[y_i = \mu + \alpha_p+\varepsilon_i, \\ \text{where}~p= \text{number of levels minus 1}\] \[\begin{pmatrix} 2\\3\\4\\6\\7\\8\\10\\11\\12 \end{pmatrix} = \begin{pmatrix} 1&0&0\\1&0&0\\1&0&0\\1&1&0\\1&1&0\\1&1&0\\1&0&1\\1&0&1\\1&0&1 \end{pmatrix} \times \begin{pmatrix} \mu\\\alpha_2\\\alpha_3 \end{pmatrix} + \begin{pmatrix} \varepsilon_1\\\varepsilon_2\\\varepsilon_3\\\varepsilon_4&\\\varepsilon_5\\\varepsilon_6\\\varepsilon_7&\\\varepsilon_8\\\varepsilon_9 \end{pmatrix} \] |
\[\alpha_1 = 3, \alpha_2 = 4, \alpha_3 = 8\]
Parameter | Estimates | Null hypothesis |
---|---|---|
\(Intercept\) | mean of control group (\(\mu_1\)) | H\(_0\): \(\mu=\mu_1=0\) |
\(\alpha^*_2\) | mean of group 2 minus mean of control group (\(\mu_2-\mu_1\)) | H\(_0\): \(\alpha^*_2=\mu_2-\mu_1=0\) |
\(\alpha^*_3\) | mean of group 3 minus mean of control group (\(\mu_3-\mu_1\)) | H\(_0\): \(\alpha^*_3=\mu_3-\mu_1=0\) |
… |
contrasts(A) <-contr.treatment contrasts(A)
## 2 3 ## G1 0 0 ## G2 1 0 ## G3 0 1
summary(lm(Y~A))$coef
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3 0.5773503 5.196152 2.022368e-03 ## A2 4 0.8164966 4.898979 2.713682e-03 ## A3 8 0.8164966 9.797959 6.506149e-05
Parameter | Estimates | Null hypothesis |
---|---|---|
\(Intercept\) | mean of control group (\(\mu_1\)) | H\(_0\): \(\mu=\mu_1=0\) |
\(\alpha^*_2\) | mean of group 2 minus mean of control group (\(\mu_2-\mu_1\)) | H\(_0\): \(\alpha^*_2=\mu_2-\mu_1=0\) |
\(\alpha^*_3\) | mean of group 3 minus mean of control group (\(\mu_3-\mu_1\)) | H\(_0\): \(\alpha^*_3=\mu_3-\mu_1=0\) |
… |
summary(lm(Y~A))$coef
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3 0.5773503 5.196152 2.022368e-03 ## A2 4 0.8164966 4.898979 2.713682e-03 ## A3 8 0.8164966 9.797959 6.506149e-05
User defined contrasts
Grp2 vs Grp3
Grp1 vs (Grp2 & Grp3)
Grp1 | Grp2 | Grp3 | |
---|---|---|---|
\(\alpha_2\) | 0 | 1 | -1 |
\(\alpha_3\) | 1 | -0.5 | -0.5 |
contrasts(A) <- cbind(c(0,1,-1),c(1, -0.5, -0.5)) contrasts(A)
## [,1] [,2] ## G1 0 1.0 ## G2 1 -0.5 ## G3 -1 -0.5
\(p-1=3~\) comparisons
contrasts(A) <- cbind(c(0,1,-1),c(1, -0.5, -0.5)) contrasts(A)
## [,1] [,2] ## G1 0 1.0 ## G2 1 -0.5 ## G3 -1 -0.5
\[ \begin{array}{rcl} 0\times 1.0 &=& 0\\ 1\times -0.5 &=& -0.5\\ -1\times -0.5 &=& 0.5\\ sum &=& 0 \end{array} \]
contrasts(A) <- cbind(c(0,1,-1),c(1, -0.5, -0.5)) contrasts(A)
## [,1] [,2] ## G1 0 1.0 ## G2 1 -0.5 ## G3 -1 -0.5
crossprod(contrasts(A))
## [,1] [,2] ## [1,] 2 0.0 ## [2,] 0 1.5
summary(lm(Y~A))$coef
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7 0.3333333 21.000000 7.595904e-07 ## A1 -2 0.4082483 -4.898979 2.713682e-03 ## A2 -4 0.4714045 -8.485281 1.465426e-04
contrasts(A) <- cbind(c(1, -0.5, -0.5),c(1,-1,0)) contrasts(A)
## [,1] [,2] ## G1 1.0 1 ## G2 -0.5 -1 ## G3 -0.5 0
crossprod(contrasts(A))
## [,1] [,2] ## [1,] 1.5 1.5 ## [2,] 1.5 2.0
anova(lm(Y~A))
## Analysis of Variance Table ## ## Response: Y ## Df Sum Sq Mean Sq F value Pr(>F) ## A 2 96 48 48 0.0002035 *** ## Residuals 6 6 1 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
No. of Groups | No. of comparisons | Familywise Type I error probability |
---|---|---|
3 | 3 | 0.14 |
5 | 10 | 0.40 |
10 | 45 | 0.90 |
Bonferoni
summary(lm(Y~A))$coef
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7 0.3333333 21.000000 7.595904e-07 ## A1 -8 0.9428090 -8.485281 1.465426e-04 ## A2 4 0.8164966 4.898979 2.713682e-03
0.05/3
## [1] 0.01666667
Tukey’s test
library(emmeans) data.lm<-lm(Y~A) emmeans(data.lm, pairwise~A)
## $emmeans ## A emmean SE df lower.CL upper.CL ## G1 3 0.577 6 1.59 4.41 ## G2 7 0.577 6 5.59 8.41 ## G3 11 0.577 6 9.59 12.41 ## ## Confidence level used: 0.95 ## ## $contrasts ## contrast estimate SE df t.ratio p.value ## G1 - G2 -4 0.816 6 -4.899 0.0065 ## G1 - G3 -8 0.816 6 -9.798 0.0002 ## G2 - G3 -4 0.816 6 -4.899 0.0065 ## ## P value adjustment: tukey method for comparing a family of 3 estimates