06 May, 2022
What is a statistical model?
What is a statistical model?
Mathematical model
Statistical model
What is a statistical model?
A random variable is one whose values depend on a set of random events and are described by a probability distribution
What is a statistical model?
embodies a data generation process along with the distributional assumptions underlying this generation
incorporates uncertainty
\(response = model + error\)
What is the purpose of statistical modelling?
How do we estimate model parameters? - \(y_i \sim{} \beta_0 + \beta_1 x_i\)
What criterion do we use to assess best fit?
If we assume \(y_i\) is drawn from a normal (gaussian) distribution…
| y | x | 
|---|---|
| 9.64 | 1 | 
| 3.79 | 2 | 
| 11.00 | 3 | 
| 27.88 | 4 | 
| 32.84 | 5 | 
| 32.56 | 6 | 
| 37.84 | 7 | 
| 29.86 | 8 | 
| 45.05 | 9 | 
| 47.65 | 10 | 
\[ y_i = \beta_0\times 1 + \beta_1\times x_i \]
\[ \underbrace{\begin{bmatrix} y_1\\ y_2\\ y_3\\ ...\\ y_i \end{bmatrix}}_{Y} = \underbrace{\begin{bmatrix} 1&x_1\\ 1&x_2\\ 1&x_3\\ ...\\ 1&x_i \end{bmatrix}}_{\boldsymbol{X}} \underbrace{\vphantom{\begin{bmatrix} y_1\\ y_2\\ y_3\\ ...\\ y_i \end{bmatrix}}\begin{bmatrix} \beta_0&\beta \end{bmatrix}}_{\boldsymbol{\beta}} \]
\[ \begin{align} Y =& \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}\\ \boldsymbol{\varepsilon} =& Y - \boldsymbol{X}\boldsymbol{\beta} \end{align} \]
Minimize resid sum of squares (\(\boldsymbol{\varepsilon}^T\boldsymbol{\varepsilon}\))
\[ \tiny \begin{bmatrix} \varepsilon_1&\varepsilon_2&\varepsilon_3&...&\varepsilon_n \end{bmatrix} \begin{bmatrix} \varepsilon_1\\ \varepsilon_2\\ \varepsilon_3\\ ...\\ \varepsilon_n\\ \end{bmatrix} = \begin{bmatrix} \varepsilon_1\times\varepsilon_1 + \varepsilon_2\times\varepsilon_2 + \varepsilon_3\times\varepsilon_3 + ... + \varepsilon_n\times\varepsilon_n \end{bmatrix} \]
\[ \begin{align} \boldsymbol{\varepsilon}^T\boldsymbol{\varepsilon} =& (Y - \boldsymbol{X}\boldsymbol{\beta})^T(Y - \boldsymbol{X}\boldsymbol{\beta})\\ \boldsymbol{\beta} =& (\boldsymbol{X}^T\boldsymbol{X})^{-1} \boldsymbol{X}^T Y \end{align} \]
## Generate the model matrix X = model.matrix(~1 + x, data = dat) ## Solve for beta beta = solve(t(X) %*% X) %*% t(X) %*% dat$y beta
## [,1] ## (Intercept) 2.650667 ## x 4.574606
Provided data (and residuals):
\(f(x\mid\mu, \sigma^2) = \frac{1}{\sqrt{2\sigma^2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\)
What do we do, if the data do not satisfy the assumptions?
\[ \begin{align} y_i =& \beta_0 + \beta_1 x_i + \varepsilon_i\\ OR\\ y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 x_i \end{align} \]
| Type | Example | Distribution | Range | 
|---|---|---|---|
| Measurements | length, weight | Gaussian | real, \(-\infty\le x\ge\infty\) | 
| logNormal | real, \(0< x\ge\infty\) | ||
| Gamma | real, \(0< x\ge\infty\) | ||
| Counts | Abundance | Poisson | discrete, \(0\ge x\le\infty\) | 
| Negative Binomial | discrete, \(0\ge x\le\infty\) | ||
| Binary | Presence/Absence | Binomial | discrete, \(x=0,1\) | 
| Proportions | Ratio | Binomial | discrete, \(0\ge x\le n\) | 
| Percentages | Percent cover | Binomial | real, \(0\le x\ge 1\) | 
| Beta | real, \(0< x >1\) | ||
| Categorical | Severity | Ordinal | discrete, \(x=0,1\) | 
What about density (count / area)?
| Type | Example | Distribution | Range | 
|---|---|---|---|
| Measurements | length, weight | Gaussian | real, \(-\infty\le x\ge\infty\) | 
| logNormal | real, \(0< x\ge\infty\) | ||
| Gamma | real, \(0< x\ge\infty\) | ||
| Counts | Abundance | Poisson | discrete, \(0\ge x\le\infty\) | 
| Negative Binomial | discrete, \(0\ge x\le\infty\) | ||
| Binary | Presence/Absence | Binomial | discrete, \(x=0,1\) | 
| Proportions | Ratio | Binomial | discrete, \(0\ge x\le n\) | 
| Percentages | Percent cover | Binomial | real, \(0\le x\ge 1\) | 
| Beta | real, \(0< x> 1\) | ||
| Categorical | Severity | Ordinal | discrete, \(x=0,1\) | 
zero-bound variables with large var.
\(f(x\mid s, a) = \frac{1}{(s^a\Gamma(a))} x^(a-1) e^{-(x/s)}\)
\(a=shape, s=scale\)
\(\mu = as, \sigma^2 = as^2\)
| Type | Example | Distribution | Range | 
|---|---|---|---|
| Measurements | length, weight | Gaussian | real, \(-\infty\le x\ge\infty\) | 
| logNormal | real, \(0< x\ge\infty\) | ||
| Gamma | real, \(0< x\ge\infty\) | ||
| Counts | Abundance | Poisson | discrete, \(0\ge x\le\infty\) | 
| Negative Binomial | discrete, \(0\ge x\le\infty\) | ||
| Binary | Presence/Absence | Binomial | discrete, \(x=0,1\) | 
| Proportions | Ratio | Binomial | discrete, \(0\ge x\le n\) | 
| Percentages | Percent cover | Binomial | real, \(0\le x\ge 1\) | 
| Beta | real, \(0< x> 1\) | ||
| Categorical | Severity | Ordinal | discrete, \(x=0,1\) | 
What about abundance?
Count data
\(f(x\mid \lambda) = \frac{e^{-\lambda}\lambda^x}{x!}\)
\(\mu = \sigma^2 = \lambda = df\)
\(\theta~(dispersion) = \frac{\sigma^2}{\mu} = 1\)
Count data
\(f(x\mid \mu, \omega) = \frac{\Gamma(x + \omega)}{\Gamma(\omega)x!}\times\frac{\mu^x\omega^\omega}{(\mu + \omega)^{\mu+\omega}}\)
\(\theta~(dispersion) = 1/\omega\)
\(\omega = -\frac{\mu^2}{\mu - \sigma^2} \hspace{1cm} \theta = 0~(when~\omega=\infty)\)
| Type | Example | Distribution | Range | 
|---|---|---|---|
| Measurements | length, weight | Gaussian | real, \(-\infty\le x\ge\infty\) | 
| logNormal | real, \(0< x\ge\infty\) | ||
| Gamma | real, \(0< x\ge\infty\) | ||
| Counts | Abundance | Poisson | discrete, \(0\ge x\le\infty\) | 
| Negative Binomial | discrete, \(0\ge x\le\infty\) | ||
| Binary | Presence/Absence | Binomial | discrete, \(x=0,1\) | 
| Proportions | Ratio | Binomial | discrete, \(0\ge x\le n\) | 
| Percentages | Percent cover | Binomial | real, \(0\le x\ge 1\) | 
| Beta | real, \(0< x> 1\) | ||
| Categorical | Severity | Ordinal | discrete, \(x=0,1\) | 
What about binary and proportions?
Proportions or Presence/absence
\(f(x\mid n, p) = \binom{n}{p}p^x(1-p)^{n-x}\)
\(\mu = np, \sigma^2 = np(1-p)\)
for presence/absence \(n=1\)
| Type | Example | Distribution | Range | 
|---|---|---|---|
| Measurements | length, weight | Gaussian | real, \(-\infty\le x\ge\infty\) | 
| logNormal | real, \(0< x\ge\infty\) | ||
| Gamma | real, \(0< x\ge\infty\) | ||
| Counts | Abundance | Poisson | discrete, \(0\ge x\le\infty\) | 
| Negative Binomial | discrete, \(0\ge x\le\infty\) | ||
| Binary | Presence/Absence | Binomial | discrete, \(x=0,1\) | 
| Proportions | Ratio | Binomial | discrete, \(0\ge x\le n\) | 
| Percentages | Percent cover | Binomial | real, \(0\le x\ge 1\) | 
| Beta | real, \(0< x> 1\) | ||
| Categorical | Severity | Ordinal | discrete, \(x=0,1\) | 
What about percentages?
Continuous between 0 and 1
\(f(x\mid a, b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} x^{a-1} (1-x)^{b-1}\)
\(\mu = \frac{a}{a+b}, \sigma^2 = \frac{ab}{(a + b)^2.(a + b + 1)}\)
\[Y = \beta_0 + \beta_1x_1~+~...~+~\beta_px_p~+~e\] \[\underbrace{g(\mu)}_{Link~~function} = \underbrace{\beta_0 + \beta_1x_1~+~...~+~\beta_px_p}_{Systematic}\]
Linear model is just a special case
\[Y = \beta_0 + \beta_1x_1~+~...~+~\beta_px_p~+~e\] \[\underbrace{g(\mu)}_{Link~~function} = \underbrace{\beta_0 + \beta_1x_1~+~...~+~\beta_px_p}_{Systematic}\]
| Response variable | Probability distribution | Link function | Model name | 
|---|---|---|---|
| Continuous measurements | Gaussian | identity: \(\mu\) | Linear regression | 
| Gamma | inverse: \(\frac{1}{\mu}\) | Gamma regression | |
| log: \(log(\mu)\) | |||
| Counts | Poisson | log: \(log(\mu)\) | Poisson regression | 
| log-linear model | |||
| Negative binomial | log: \(log(\mu)\) | Negative binomial regression | |
| Quasi-poisson | log: \(log(\mu)\) | Poisson regression | |
| Binary, proportions | Binomial | logit: | Logistic regression | 
| \(log\left(\frac{\pi}{1-\pi}\right)\) | |||
| probit: \(\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\alpha+\beta.X} exp\left(-\frac{1}{2}Z^2\right)dZ\) | Probit regression | ||
| Complimentary log-log: \(log(-log(1-\pi))\) | Logistic regression | ||
| Percentages | Beta | logit: \(log\left(\frac{\pi}{1-\pi}\right)\) | Beta regression | 
Gaussian: \[ P(x_i|\mu, \sigma) = \frac{1}{\sqrt{\sigma^2 2\pi}} exp^{-\frac{1}{2}((x_i-\mu)/\sigma)^2} \]
Likelihood:
\[ L(X|\theta) = \prod^n_{i=1} f'(x_i|\theta) \]
Product of the likelihood of each observation given parameters.
Log-likelihood:
 \[
LL(X|\theta) = \sum^n_{i=1} log(f'(x_i|\theta))
\]  
## mu sigma LL ## 4252 27.87879 14.24242 40.76342
Empirical mean and sd:
## [1] "Mean= 27.811 SD= 15.0303407590558"
Nelder-Mead algorithm
optim(par=c(20,12), LL.gaus, x=dat$y)
## $par ## [1] 27.81143 14.25771 ## ## $value ## [1] 40.76329 ## ## $counts ## function gradient ## 53 NA ## ## $convergence ## [1] 0 ## ## $message ## NULL
Newton-Raphson method 
nlm(LL.gaus, c(20,12), x=dat$y, hessian=TRUE, gradtol=1e-03)
## $minimum ## [1] 40.76329 ## ## $estimate ## [1] 27.81205 14.25769 ## ## $gradient ## [1] 5.249938e-05 -1.312170e-04 ## ## $hessian ## [,1] [,2] ## [1,] 4.919280e-02 -1.686155e-05 ## [2,] -1.686155e-05 9.836416e-02 ## ## $code ## [1] 1 ## ## $iterations ## [1] 9
optim(par=c(20,12), LL.gaus, x=dat$y, method='SANN')
## $par ## [1] 27.82430 14.25626 ## ## $value ## [1] 40.7633 ## ## $counts ## function gradient ## 10000 NA ## ## $convergence ## [1] 0 ## ## $message ## NULL
\[ \boldsymbol{\beta} = (\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}^{-1})\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{z} \]
glm(y~1, data=dat)
## ## Call: glm(formula = y ~ 1, data = dat) ## ## Coefficients: ## (Intercept) ## 27.81 ## ## Degrees of Freedom: 9 Total (i.e. Null); 9 Residual ## Null Deviance: 2033 ## Residual Deviance: 2033 AIC: 85.53