Introduction to linear modelling

06 May, 2022

Statistical modelling

What is a statistical model?

Statistical modelling

What is a statistical model?

Mathematical model

Statistical model

Statistical modelling

What is a statistical model?

stochastic mathematical expression
low-dimensional summary
relates one or more dependent random variables to one or more independent variables

Statistical modelling

A random variable is one whose values depend on a set of random events and are described by a probability distribution

Statistical modelling

What is a statistical model?

embodies a data generation process along with the distributional assumptions underlying this generation
incorporates uncertainty
\(response = model + error\)

incorporate error (uncertainty)

Statistical modelling

What is the purpose of statistical modelling?

describe relationships / effects
estimate effects
predict outcomes

Statistical models

How do we estimate model parameters? - \(y_i \sim{} \beta_0 + \beta_1 x_i\)

What criterion do we use to assess best fit?

Depends on how we assume Y is distributed

Statistical models

If we assume \(y_i\) is drawn from a normal (gaussian) distribution…

Ordinary Least Squares OLS

Estimation

parameters
- location (mean)
- spread (variance) - uncertainty

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares estimates

Minimize sum of the squared residuals

y	x
9.64	1
3.79	2
11.00	3
27.88	4
32.84	5
32.56	6
37.84	7
29.86	8
45.05	9
47.65	10

Estimation

Least squares estimates

\[ y_i = \beta_0\times 1 + \beta_1\times x_i \]

\[ \underbrace{\begin{bmatrix} y_1\\ y_2\\ y_3\\ ...\\ y_i \end{bmatrix}}_{Y} = \underbrace{\begin{bmatrix} 1&x_1\\ 1&x_2\\ 1&x_3\\ ...\\ 1&x_i \end{bmatrix}}_{\boldsymbol{X}} \underbrace{\vphantom{\begin{bmatrix} y_1\\ y_2\\ y_3\\ ...\\ y_i \end{bmatrix}}\begin{bmatrix} \beta_0&\beta \end{bmatrix}}_{\boldsymbol{\beta}} \]

Estimation

Least squares estimates

\[ \begin{align} Y =& \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}\\ \boldsymbol{\varepsilon} =& Y - \boldsymbol{X}\boldsymbol{\beta} \end{align} \]

Minimize resid sum of squares (\(\boldsymbol{\varepsilon}^T\boldsymbol{\varepsilon}\))

\[ \tiny \begin{bmatrix} \varepsilon_1&\varepsilon_2&\varepsilon_3&...&\varepsilon_n \end{bmatrix} \begin{bmatrix} \varepsilon_1\\ \varepsilon_2\\ \varepsilon_3\\ ...\\ \varepsilon_n\\ \end{bmatrix} = \begin{bmatrix} \varepsilon_1\times\varepsilon_1 + \varepsilon_2\times\varepsilon_2 + \varepsilon_3\times\varepsilon_3 + ... + \varepsilon_n\times\varepsilon_n \end{bmatrix} \]

Estimation

Least squares estimates

\[ \begin{align} \boldsymbol{\varepsilon}^T\boldsymbol{\varepsilon} =& (Y - \boldsymbol{X}\boldsymbol{\beta})^T(Y - \boldsymbol{X}\boldsymbol{\beta})\\ \boldsymbol{\beta} =& (\boldsymbol{X}^T\boldsymbol{X})^{-1} \boldsymbol{X}^T Y \end{align} \]

## Generate the model matrix
X = model.matrix(~1 + x, data = dat)
## Solve for beta
beta = solve(t(X) %*% X) %*% t(X) %*% dat$y
beta

##                 [,1]
## (Intercept) 2.650667
## x           4.574606

Estimation

Least squares estimates

Minimize sum of the squared residuals
Simple matrix algebra

Provided data (and residuals):

Gaussian
Equally varied
Independent

Gaussian distribution

\(f(x\mid\mu, \sigma^2) = \frac{1}{\sqrt{2\sigma^2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\)

Linear model assumptions

Normality
Homogeneity of variance
Linearity
Independence

Linear model assumptions

What do we do, if the data do not satisfy the assumptions?

Scale transformations

Linear model

\[ \begin{align} y_i =& \beta_0 + \beta_1 x_i + \varepsilon_i\\ OR\\ y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 x_i \end{align} \]

model embodies data generation processes
pertains to:
- effects (linear predictor)
- distribution

Data types

Type	Example	Distribution	Range
Measurements	length, weight	Gaussian	real, \(-\infty\le x\ge\infty\)
		logNormal	real, \(0< x\ge\infty\)
		Gamma	real, \(0< x\ge\infty\)
Counts	Abundance	Poisson	discrete, \(0\ge x\le\infty\)
		Negative Binomial	discrete, \(0\ge x\le\infty\)
Binary	Presence/Absence	Binomial	discrete, \(x=0,1\)
Proportions	Ratio	Binomial	discrete, \(0\ge x\le n\)
Percentages	Percent cover	Binomial	real, \(0\le x\ge 1\)
		Beta	real, \(0< x >1\)
Categorical	Severity	Ordinal	discrete, \(x=0,1\)

measurements typically away from zero

What about density (count / area)?

Data types

Type	Example	Distribution	Range
Measurements	length, weight	Gaussian	real, \(-\infty\le x\ge\infty\)
		logNormal	real, \(0< x\ge\infty\)
		Gamma	real, \(0< x\ge\infty\)
Counts	Abundance	Poisson	discrete, \(0\ge x\le\infty\)
		Negative Binomial	discrete, \(0\ge x\le\infty\)
Binary	Presence/Absence	Binomial	discrete, \(x=0,1\)
Proportions	Ratio	Binomial	discrete, \(0\ge x\le n\)
Percentages	Percent cover	Binomial	real, \(0\le x\ge 1\)
		Beta	real, \(0< x> 1\)
Categorical	Severity	Ordinal	discrete, \(x=0,1\)

no mass at 0

Gamma

zero-bound variables with large var.

\(f(x\mid s, a) = \frac{1}{(s^a\Gamma(a))} x^(a-1) e^{-(x/s)}\)

\(a=shape, s=scale\)

\(\mu = as, \sigma^2 = as^2\)

Data types

Type	Example	Distribution	Range
Measurements	length, weight	Gaussian	real, \(-\infty\le x\ge\infty\)
		logNormal	real, \(0< x\ge\infty\)
		Gamma	real, \(0< x\ge\infty\)
Counts	Abundance	Poisson	discrete, \(0\ge x\le\infty\)
		Negative Binomial	discrete, \(0\ge x\le\infty\)
Binary	Presence/Absence	Binomial	discrete, \(x=0,1\)
Proportions	Ratio	Binomial	discrete, \(0\ge x\le n\)
Percentages	Percent cover	Binomial	real, \(0\le x\ge 1\)
		Beta	real, \(0< x> 1\)
Categorical	Severity	Ordinal	discrete, \(x=0,1\)

What about abundance?

Poisson distribution

Count data

\(f(x\mid \lambda) = \frac{e^{-\lambda}\lambda^x}{x!}\)

\(\mu = \sigma^2 = \lambda = df\)

\(\theta~(dispersion) = \frac{\sigma^2}{\mu} = 1\)

Negative Binomial

Count data

\(f(x\mid \mu, \omega) = \frac{\Gamma(x + \omega)}{\Gamma(\omega)x!}\times\frac{\mu^x\omega^\omega}{(\mu + \omega)^{\mu+\omega}}\)

\(\theta~(dispersion) = 1/\omega\)

\(\omega = -\frac{\mu^2}{\mu - \sigma^2} \hspace{1cm} \theta = 0~(when~\omega=\infty)\)

Data types

Type	Example	Distribution	Range
Measurements	length, weight	Gaussian	real, \(-\infty\le x\ge\infty\)
		logNormal	real, \(0< x\ge\infty\)
		Gamma	real, \(0< x\ge\infty\)
Counts	Abundance	Poisson	discrete, \(0\ge x\le\infty\)
		Negative Binomial	discrete, \(0\ge x\le\infty\)
Binary	Presence/Absence	Binomial	discrete, \(x=0,1\)
Proportions	Ratio	Binomial	discrete, \(0\ge x\le n\)
Percentages	Percent cover	Binomial	real, \(0\le x\ge 1\)
		Beta	real, \(0< x> 1\)
Categorical	Severity	Ordinal	discrete, \(x=0,1\)

What about binary and proportions?

Binomial distribution

Proportions or Presence/absence

\(f(x\mid n, p) = \binom{n}{p}p^x(1-p)^{n-x}\)

\(\mu = np, \sigma^2 = np(1-p)\)

for presence/absence \(n=1\)

Data types

Type	Example	Distribution	Range
Measurements	length, weight	Gaussian	real, \(-\infty\le x\ge\infty\)
		logNormal	real, \(0< x\ge\infty\)
		Gamma	real, \(0< x\ge\infty\)
Counts	Abundance	Poisson	discrete, \(0\ge x\le\infty\)
		Negative Binomial	discrete, \(0\ge x\le\infty\)
Binary	Presence/Absence	Binomial	discrete, \(x=0,1\)
Proportions	Ratio	Binomial	discrete, \(0\ge x\le n\)
Percentages	Percent cover	Binomial	real, \(0\le x\ge 1\)
		Beta	real, \(0< x> 1\)
Categorical	Severity	Ordinal	discrete, \(x=0,1\)

What about percentages?

Beta

Continuous between 0 and 1

\(f(x\mid a, b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} x^{a-1} (1-x)^{b-1}\)

\(\mu = \frac{a}{a+b}, \sigma^2 = \frac{ab}{(a + b)^2.(a + b + 1)}\)

must consider zero-one inflated

Generalized linear models

\[Y = \beta_0 + \beta_1x_1~+~...~+~\beta_px_p~+~e\] \[\underbrace{g(\mu)}_{Link~~function} = \underbrace{\beta_0 + \beta_1x_1~+~...~+~\beta_px_p}_{Systematic}\]

Random component. \[Y \sim{} Dist(\mu, ...)\]
Systematic component \([-\infty,\infty]\)
Link function (\(g()\)) \[g(\mu) = \beta_0 + \beta_1x_1~+~...~+~\beta_px_p\]

Generalized linear models

Linear model is just a special case

\[Y = \beta_0 + \beta_1x_1~+~...~+~\beta_px_p~+~e\] \[\underbrace{g(\mu)}_{Link~~function} = \underbrace{\beta_0 + \beta_1x_1~+~...~+~\beta_px_p}_{Systematic}\]

Random component. \[Y \sim{} N(\mu,\sigma^2)\]
Systematic component \([-\infty,\infty]\)
Link function (\(g()\)) \[I(\mu) = \beta_0 + \beta_1x_1~+~...~+~\beta_px_p\]

Generalized linear models

Response variable	Probability distribution	Link function	Model name
Continuous measurements	Gaussian	identity: \(\mu\)	Linear regression
	Gamma	inverse: \(\frac{1}{\mu}\)	Gamma regression
		log: \(log(\mu)\)
Counts	Poisson	log: \(log(\mu)\)	Poisson regression
			log-linear model
	Negative binomial	log: \(log(\mu)\)	Negative binomial regression
	Quasi-poisson	log: \(log(\mu)\)	Poisson regression
Binary, proportions	Binomial	logit:	Logistic regression
		\(log\left(\frac{\pi}{1-\pi}\right)\)
		probit: \(\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\alpha+\beta.X} exp\left(-\frac{1}{2}Z^2\right)dZ\)	Probit regression
		Complimentary log-log: \(log(-log(1-\pi))\)	Logistic regression
Percentages	Beta	logit: \(log\left(\frac{\pi}{1-\pi}\right)\)	Beta regression

OLS

Maximum Likelihood

Gaussian: \[ P(x_i|\mu, \sigma) = \frac{1}{\sqrt{\sigma^2 2\pi}} exp^{-\frac{1}{2}((x_i-\mu)/\sigma)^2} \]

Likelihood:

\[ L(X|\theta) = \prod^n_{i=1} f'(x_i|\theta) \]

Product of the likelihood of each observation given parameters.

Maximum Likelihood

Log-likelihood:

\[ LL(X|\theta) = \sum^n_{i=1} log(f'(x_i|\theta)) \]

Optimization

most optimizers find minimums
- work with negative log-likelihood
Brute force

##            mu    sigma       LL
## 4252 27.87879 14.24242 40.76342

Empirical mean and sd:

## [1] "Mean= 27.811  SD= 15.0303407590558"

Optimization

Brute force
Simplex based

Nelder-Mead algorithm

optim(par=c(20,12), LL.gaus, x=dat$y)

## $par
## [1] 27.81143 14.25771
## 
## $value
## [1] 40.76329
## 
## $counts
## function gradient 
##       53       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Optimization

Brute force
Simplex based
Derivative based

Newton-Raphson method

nlm(LL.gaus, c(20,12), x=dat$y, hessian=TRUE, gradtol=1e-03)

## $minimum
## [1] 40.76329
## 
## $estimate
## [1] 27.81205 14.25769
## 
## $gradient
## [1]  5.249938e-05 -1.312170e-04
## 
## $hessian
##               [,1]          [,2]
## [1,]  4.919280e-02 -1.686155e-05
## [2,] -1.686155e-05  9.836416e-02
## 
## $code
## [1] 1
## 
## $iterations
## [1] 9

Optimization

Brute force
Simplex based
Derivative based
Stochastic global Simulated Annealing

optim(par=c(20,12), LL.gaus, x=dat$y, method='SANN')

## $par
## [1] 27.82430 14.25626
## 
## $value
## [1] 40.7633
## 
## $counts
## function gradient 
##    10000       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Optimization

Brute force
Simplex based
Derivative based
Stochastic global
Iterative re-weighted least squares

\[ \boldsymbol{\beta} = (\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}^{-1})\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{z} \]

glm(y~1, data=dat)

## 
## Call:  glm(formula = y ~ 1, data = dat)
## 
## Coefficients:
## (Intercept)  
##       27.81  
## 
## Degrees of Freedom: 9 Total (i.e. Null);  9 Residual
## Null Deviance:       2033 
## Residual Deviance: 2033  AIC: 85.53