06 May, 2022

Statistical modelling

Statistical modelling

What is a statistical model?

Statistical modelling

What is a statistical model?

Mathematical model

Statistical model

Statistical modelling

What is a statistical model?

  • stochastic mathematical expression
  • low-dimensional summary
  • relates one or more dependent random variables to one or more independent variables

Statistical modelling

A random variable is one whose values depend on a set of random events and are described by a probability distribution

Statistical modelling

What is a statistical model?

  • embodies a data generation process along with the distributional assumptions underlying this generation

  • incorporates uncertainty

  • \(response = model + error\)

  • incorporate error (uncertainty)

Statistical modelling

What is the purpose of statistical modelling?

  • describe relationships / effects
  • estimate effects
  • predict outcomes

Statistical models

How do we estimate model parameters? - \(y_i \sim{} \beta_0 + \beta_1 x_i\)

What criterion do we use to assess best fit?

  • Depends on how we assume Y is distributed

Statistical models

If we assume \(y_i\) is drawn from a normal (gaussian) distribution…

  • Ordinary Least Squares OLS

Estimation

  • parameters
    • location (mean)
    • spread (variance) - uncertainty

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares

Estimation

Least squares estimates

  • Minimize sum of the squared residuals
y x
9.64 1
3.79 2
11.00 3
27.88 4
32.84 5
32.56 6
37.84 7
29.86 8
45.05 9
47.65 10

Estimation

Least squares estimates

\[ y_i = \beta_0\times 1 + \beta_1\times x_i \]

\[ \underbrace{\begin{bmatrix} y_1\\ y_2\\ y_3\\ ...\\ y_i \end{bmatrix}}_{Y} = \underbrace{\begin{bmatrix} 1&x_1\\ 1&x_2\\ 1&x_3\\ ...\\ 1&x_i \end{bmatrix}}_{\boldsymbol{X}} \underbrace{\vphantom{\begin{bmatrix} y_1\\ y_2\\ y_3\\ ...\\ y_i \end{bmatrix}}\begin{bmatrix} \beta_0&\beta \end{bmatrix}}_{\boldsymbol{\beta}} \]

Estimation

Least squares estimates

\[ \begin{align} Y =& \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}\\ \boldsymbol{\varepsilon} =& Y - \boldsymbol{X}\boldsymbol{\beta} \end{align} \]

Minimize resid sum of squares (\(\boldsymbol{\varepsilon}^T\boldsymbol{\varepsilon}\))

\[ \tiny \begin{bmatrix} \varepsilon_1&\varepsilon_2&\varepsilon_3&...&\varepsilon_n \end{bmatrix} \begin{bmatrix} \varepsilon_1\\ \varepsilon_2\\ \varepsilon_3\\ ...\\ \varepsilon_n\\ \end{bmatrix} = \begin{bmatrix} \varepsilon_1\times\varepsilon_1 + \varepsilon_2\times\varepsilon_2 + \varepsilon_3\times\varepsilon_3 + ... + \varepsilon_n\times\varepsilon_n \end{bmatrix} \]

Estimation

Least squares estimates

\[ \begin{align} \boldsymbol{\varepsilon}^T\boldsymbol{\varepsilon} =& (Y - \boldsymbol{X}\boldsymbol{\beta})^T(Y - \boldsymbol{X}\boldsymbol{\beta})\\ \boldsymbol{\beta} =& (\boldsymbol{X}^T\boldsymbol{X})^{-1} \boldsymbol{X}^T Y \end{align} \]

## Generate the model matrix
X = model.matrix(~1 + x, data = dat)
## Solve for beta
beta = solve(t(X) %*% X) %*% t(X) %*% dat$y
beta
##                 [,1]
## (Intercept) 2.650667
## x           4.574606

Estimation

Least squares estimates

  • Minimize sum of the squared residuals
  • Simple matrix algebra

Provided data (and residuals):

  • Gaussian
  • Equally varied
  • Independent

Gaussian distribution

\(f(x\mid\mu, \sigma^2) = \frac{1}{\sqrt{2\sigma^2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\)

Linear model assumptions

  • Normality
  • Homogeneity of variance
  • Linearity
  • Independence

Linear model assumptions

What do we do, if the data do not satisfy the assumptions?

Scale transformations

Linear model

\[ \begin{align} y_i =& \beta_0 + \beta_1 x_i + \varepsilon_i\\ OR\\ y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 x_i \end{align} \]

  • model embodies data generation processes
  • pertains to:
    • effects (linear predictor)
    • distribution

Data types

Type Example Distribution Range
Measurements length, weight Gaussian real, \(-\infty\le x\ge\infty\)
logNormal real, \(0< x\ge\infty\)
Gamma real, \(0< x\ge\infty\)
Counts Abundance Poisson discrete, \(0\ge x\le\infty\)
Negative Binomial discrete, \(0\ge x\le\infty\)
Binary Presence/Absence Binomial discrete, \(x=0,1\)
Proportions Ratio Binomial discrete, \(0\ge x\le n\)
Percentages Percent cover Binomial real, \(0\le x\ge 1\)
Beta real, \(0< x >1\)
Categorical Severity Ordinal discrete, \(x=0,1\)
  • measurements typically away from zero

What about density (count / area)?

Data types

Type Example Distribution Range
Measurements length, weight Gaussian real, \(-\infty\le x\ge\infty\)
logNormal real, \(0< x\ge\infty\)
Gamma real, \(0< x\ge\infty\)
Counts Abundance Poisson discrete, \(0\ge x\le\infty\)
Negative Binomial discrete, \(0\ge x\le\infty\)
Binary Presence/Absence Binomial discrete, \(x=0,1\)
Proportions Ratio Binomial discrete, \(0\ge x\le n\)
Percentages Percent cover Binomial real, \(0\le x\ge 1\)
Beta real, \(0< x> 1\)
Categorical Severity Ordinal discrete, \(x=0,1\)
  • no mass at 0

Gamma

zero-bound variables with large var.

\(f(x\mid s, a) = \frac{1}{(s^a\Gamma(a))} x^(a-1) e^{-(x/s)}\)

\(a=shape, s=scale\)

\(\mu = as, \sigma^2 = as^2\)

Data types

Type Example Distribution Range
Measurements length, weight Gaussian real, \(-\infty\le x\ge\infty\)
logNormal real, \(0< x\ge\infty\)
Gamma real, \(0< x\ge\infty\)
Counts Abundance Poisson discrete, \(0\ge x\le\infty\)
Negative Binomial discrete, \(0\ge x\le\infty\)
Binary Presence/Absence Binomial discrete, \(x=0,1\)
Proportions Ratio Binomial discrete, \(0\ge x\le n\)
Percentages Percent cover Binomial real, \(0\le x\ge 1\)
Beta real, \(0< x> 1\)
Categorical Severity Ordinal discrete, \(x=0,1\)

What about abundance?

Poisson distribution

Count data

\(f(x\mid \lambda) = \frac{e^{-\lambda}\lambda^x}{x!}\)

\(\mu = \sigma^2 = \lambda = df\)

\(\theta~(dispersion) = \frac{\sigma^2}{\mu} = 1\)

Negative Binomial

Count data

\(f(x\mid \mu, \omega) = \frac{\Gamma(x + \omega)}{\Gamma(\omega)x!}\times\frac{\mu^x\omega^\omega}{(\mu + \omega)^{\mu+\omega}}\)

\(\theta~(dispersion) = 1/\omega\)

\(\omega = -\frac{\mu^2}{\mu - \sigma^2} \hspace{1cm} \theta = 0~(when~\omega=\infty)\)

Data types

Type Example Distribution Range
Measurements length, weight Gaussian real, \(-\infty\le x\ge\infty\)
logNormal real, \(0< x\ge\infty\)
Gamma real, \(0< x\ge\infty\)
Counts Abundance Poisson discrete, \(0\ge x\le\infty\)
Negative Binomial discrete, \(0\ge x\le\infty\)
Binary Presence/Absence Binomial discrete, \(x=0,1\)
Proportions Ratio Binomial discrete, \(0\ge x\le n\)
Percentages Percent cover Binomial real, \(0\le x\ge 1\)
Beta real, \(0< x> 1\)
Categorical Severity Ordinal discrete, \(x=0,1\)

What about binary and proportions?

Binomial distribution

Proportions or Presence/absence

\(f(x\mid n, p) = \binom{n}{p}p^x(1-p)^{n-x}\)

\(\mu = np, \sigma^2 = np(1-p)\)

for presence/absence \(n=1\)

Data types

Type Example Distribution Range
Measurements length, weight Gaussian real, \(-\infty\le x\ge\infty\)
logNormal real, \(0< x\ge\infty\)
Gamma real, \(0< x\ge\infty\)
Counts Abundance Poisson discrete, \(0\ge x\le\infty\)
Negative Binomial discrete, \(0\ge x\le\infty\)
Binary Presence/Absence Binomial discrete, \(x=0,1\)
Proportions Ratio Binomial discrete, \(0\ge x\le n\)
Percentages Percent cover Binomial real, \(0\le x\ge 1\)
Beta real, \(0< x> 1\)
Categorical Severity Ordinal discrete, \(x=0,1\)

What about percentages?

Beta

Continuous between 0 and 1

\(f(x\mid a, b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} x^{a-1} (1-x)^{b-1}\)

\(\mu = \frac{a}{a+b}, \sigma^2 = \frac{ab}{(a + b)^2.(a + b + 1)}\)

  • must consider zero-one inflated

Generalized linear models

\[Y = \beta_0 + \beta_1x_1~+~...~+~\beta_px_p~+~e\] \[\underbrace{g(\mu)}_{Link~~function} = \underbrace{\beta_0 + \beta_1x_1~+~...~+~\beta_px_p}_{Systematic}\]

  • Random component. \[Y \sim{} Dist(\mu, ...)\]
  • Systematic component \([-\infty,\infty]\)
  • Link function (\(g()\)) \[g(\mu) = \beta_0 + \beta_1x_1~+~...~+~\beta_px_p\]

Generalized linear models

Linear model is just a special case

\[Y = \beta_0 + \beta_1x_1~+~...~+~\beta_px_p~+~e\] \[\underbrace{g(\mu)}_{Link~~function} = \underbrace{\beta_0 + \beta_1x_1~+~...~+~\beta_px_p}_{Systematic}\]

  • Random component. \[Y \sim{} N(\mu,\sigma^2)\]
  • Systematic component \([-\infty,\infty]\)
  • Link function (\(g()\)) \[I(\mu) = \beta_0 + \beta_1x_1~+~...~+~\beta_px_p\]

Generalized linear models

Response variable Probability distribution Link function Model name
Continuous measurements Gaussian identity: \(\mu\) Linear regression
Gamma inverse: \(\frac{1}{\mu}\) Gamma regression
log: \(log(\mu)\)
Counts Poisson log: \(log(\mu)\) Poisson regression
log-linear model
Negative binomial log: \(log(\mu)\) Negative binomial regression
Quasi-poisson log: \(log(\mu)\) Poisson regression
Binary, proportions Binomial logit: Logistic regression
\(log\left(\frac{\pi}{1-\pi}\right)\)
probit: \(\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\alpha+\beta.X} exp\left(-\frac{1}{2}Z^2\right)dZ\) Probit regression
Complimentary log-log: \(log(-log(1-\pi))\) Logistic regression
Percentages Beta logit: \(log\left(\frac{\pi}{1-\pi}\right)\) Beta regression

OLS

Maximum Likelihood

Gaussian: \[ P(x_i|\mu, \sigma) = \frac{1}{\sqrt{\sigma^2 2\pi}} exp^{-\frac{1}{2}((x_i-\mu)/\sigma)^2} \]

Likelihood:

\[ L(X|\theta) = \prod^n_{i=1} f'(x_i|\theta) \]

Product of the likelihood of each observation given parameters.

Maximum Likelihood

Log-likelihood:

\[ LL(X|\theta) = \sum^n_{i=1} log(f'(x_i|\theta)) \]

Optimization

  • most optimizers find minimums
    • work with negative log-likelihood
  • Brute force

##            mu    sigma       LL
## 4252 27.87879 14.24242 40.76342

Empirical mean and sd:

## [1] "Mean= 27.811  SD= 15.0303407590558"

Optimization

  • Brute force
  • Simplex based

Nelder-Mead algorithm

optim(par=c(20,12), LL.gaus, x=dat$y)
## $par
## [1] 27.81143 14.25771
## 
## $value
## [1] 40.76329
## 
## $counts
## function gradient 
##       53       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Optimization

  • Brute force
  • Simplex based
  • Derivative based

Newton-Raphson method


nlm(LL.gaus, c(20,12), x=dat$y, hessian=TRUE, gradtol=1e-03)
## $minimum
## [1] 40.76329
## 
## $estimate
## [1] 27.81205 14.25769
## 
## $gradient
## [1]  5.249938e-05 -1.312170e-04
## 
## $hessian
##               [,1]          [,2]
## [1,]  4.919280e-02 -1.686155e-05
## [2,] -1.686155e-05  9.836416e-02
## 
## $code
## [1] 1
## 
## $iterations
## [1] 9

Optimization

  • Brute force
  • Simplex based
  • Derivative based
  • Stochastic global Simulated Annealing
optim(par=c(20,12), LL.gaus, x=dat$y, method='SANN')
## $par
## [1] 27.82430 14.25626
## 
## $value
## [1] 40.7633
## 
## $counts
## function gradient 
##    10000       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Optimization

  • Brute force
  • Simplex based
  • Derivative based
  • Stochastic global
  • Iterative re-weighted least squares

\[ \boldsymbol{\beta} = (\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}^{-1})\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{z} \]

glm(y~1, data=dat)
## 
## Call:  glm(formula = y ~ 1, data = dat)
## 
## Coefficients:
## (Intercept)  
##       27.81  
## 
## Degrees of Freedom: 9 Total (i.e. Null);  9 Residual
## Null Deviance:       2033 
## Residual Deviance: 2033  AIC: 85.53