library(splines)
library(ggfortify)
library(ggeffects)
library(mgcv)
library(gratia)
library(emmeans)
library(gridExtra)
library(tidyverse)
To illustrate non-parametric models (and GAMs in particular), lets simulate two non-linear trends:
We will also add random noise to both trends. We will then plot the sine wave (as a dashed line) along with the realised samples (solid dots).
set.seed(123)
= seq(0,2*pi, len=50)
x =sin(x)
yp= sin(x) + rnorm(length(x), mean=0, sd=0.5)
y = data.frame(yp=yp, y=y, x=x)
dat =ggplot(dat, aes(y=y, x=x)) + geom_point() +
g1geom_line(aes(y=yp), linetype='dashed') + ggtitle('Example 1')
set.seed(123)
= seq(0,3*pi, len=50)
x = 2 - 0.5*x + sin(x)
yp = 2 - 0.5*x + sin(x) + rnorm(length(x), mean=0, sd=0.5)
y = data.frame(yp=yp, y=y, x=x)
dat2 =ggplot(dat2, aes(y=y, x=x)) + geom_point() +
g2geom_line(aes(y=yp), linetype='dashed') + ggtitle('Example 2')
grid.arrange(g1,g2, nrow=1)
A simple (generalized) linear models, model a response variable against a linear predictor (linear combination of predictor variables) according to the familiar form:
\[ g(E(y)) = \beta_0 + x_1\beta_1 \]
where \(E(y)\) is the expected value of y (which is dependent on a nominated family), \(g()\) represents the link function that links the expected values of y back to the nominated family, \(\beta_0\) is the y-intercept and \(\beta_i\) is the slope associated with the predictor (\(x_1\)).
If we fit a simple linear model to our example data, we will see that a straight line is not a good fit for either data - there are curvy patterns left in the residuals.
= lm(y~x, data=dat)
dat.lm = lm(y~x, data=dat2)
dat2.lm autoplot(list(dat.lm, dat2.lm), which=1)
grid.arrange(
ggpredict(dat.lm, ~x) %>% plot(rawdata=TRUE),
ggpredict(dat2.lm, ~x) %>% plot(rawdata=TRUE),
nrow=1
)
Although the model above (\(g(E(y)) = \beta_0 + x_1\beta_1\)) is a linear model, it does not mean that it is restricted to modelling purely linear trends. We could for example, fit a model in which \(y\) is modelled against a polynomial for \(x\). A third order polynomial includes \(x\), \(x^2\) and \(x^3\)
\[ g(E(y)) = \beta_0 + x_1\beta_1 + x_1^2\beta_2 + x_1^3\beta_3 \]
When fitting such a model, we need to include the squared and cubed terms within I()
to ensure that they are each considered separate terms. If the model was specified as y ~ x + x^2 + x^3
, a single term for x
would first be created that was the sum of \(x\), \(x^2\) and \(x^3\) - not what we want.
= lm(y~x+I(x^2)+I(x^3), data=dat)
dat.lm2 = lm(y~x+I(x^2)+I(x^3), data=dat2)
dat2.lm2 autoplot(list(dat.lm2,dat2.lm2), which=1)
grid.arrange(
ggpredict(dat.lm2, ~x) %>% plot(rawdata=TRUE),
ggpredict(dat2.lm2, ~x) %>% plot(rawdata=TRUE),
nrow=1)
The model fitted in the right hand panel (linear decline with sine wave example) appears a little over simplified. While it has captured some elements of the trend (a general decline that is not purely linear), it has not captured the sine wave properly. We will return to this point after exploring the left hand panel.
Although the overal fit for the left hand panel (simple sine wave) is fine, the indivual terms (linear, quadratic and cubic) are not orthogonal (independent) - they are not identifiable. Recall the (multi)collinearity assumption of multiple linear regression, in which no covariate should be correlated to the others. Hence, we cannot not examine the estimated \(\beta\) parameters in order to describe each of the linear, quadratic and cubic components of the trend.
The poly()
function can be used to generate orthogonal polynomials:
= lm(y~poly(x,3), data=dat)
dat.lm3 autoplot(dat.lm3, which=1)
ggpredict(dat.lm3, ~x) %>% plot(rawdata=TRUE)
summary(dat.lm3)
##
## Call:
## lm(formula = y ~ poly(x, 3), data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.84027 -0.27405 -0.09366 0.31064 1.13336
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.01720 0.06557 0.262 0.794
## poly(x, 3)1 -3.80897 0.46365 -8.215 1.43e-10 ***
## poly(x, 3)2 0.33194 0.46365 0.716 0.478
## poly(x, 3)3 2.91779 0.46365 6.293 1.05e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4637 on 46 degrees of freedom
## Multiple R-squared: 0.7005, Adjusted R-squared: 0.681
## F-statistic: 35.87 on 3 and 46 DF, p-value: 4.198e-12
We can see that there is a significant cubic component to the trend.
Whilst the fit for Example 1 data was reasonable, the same could not be said for Example 2 data. For the latter, a simple third order polynomial might be inadequate. What if, instead of fitting a single polynomial through the entire data cloud, we fit multiple shorter polynomials through shorter windows of the data - that is, fitting piecewise cubic polynomials.
We could naively start by splitting the range of \(x\) into multiple parts (lets start with two, with a split just left of center), and fit a separate third order polynomial to each part. One of the windows will span from the minimum \(x\) value to \(x-0.5\) (arbitrarily selected) and the other will span from \(x-0.5\) to the maximum value of \(x\). The point where the windows meets is called the breakpoint.
Rather than refer to windows of the data, we could instead refer to knots. In this case, the knots could represent the bounds of the windows. Hence, in this case, there are three knots: one at the start, one near the middle and one at the end.
=mean(dat2$x)-0.5
mid= dat2 %>% filter(x<=mid)
dat3a = lm(y~poly(x,3), data=dat3a)
dat3a.lm = dat2 %>% filter(x>mid)
dat3b = lm(y~poly(x,3), data=dat3b)
dat3b.lm
=data.frame(x=seq(min(dat3a$x), mid, len=100)) %>%
fitacbind(predict(dat3a.lm, newdata=., interval='confidence'))
=data.frame(x=seq(mid, max(dat3b$x), len=100)) %>%
fitbcbind(predict(dat3b.lm, newdata=., interval='confidence'))
ggplot() +
geom_ribbon(data=fita, aes(ymin=lwr, ymax=upr, x=x, fill='1'), alpha=0.3, show.legend=FALSE) +
geom_ribbon(data=fitb, aes(ymin=lwr, ymax=upr, x=x, fill='2'), alpha=0.3, show.legend=FALSE) +
geom_line(data=fita, aes(y=fit, x=x, color='1'), show.legend=FALSE) +
geom_line(data=fitb, aes(y=fit, x=x, color='2'), show.legend=FALSE) +
geom_point(data=dat2, aes(y=y, x=x))
OR equivalently..
=mean(dat2$x)-0.5
mid= dat2 %>% mutate(x2=ifelse(x-mid > 0, x, 0))
dat3 = lm(y~poly(x,3) + poly(x2,3), data=dat3)
dat3.lm =with(dat2, data.frame(x=seq(min(x), max(x), len=1000))) %>%
newdatamutate(x2=ifelse(x-mid > 0, x, 0))
= newdata %>% cbind(predict(dat3.lm, newdata=newdata, interval='confidence')) %>%
dat3 mutate(Part=ifelse(x2==0, '1', '2'))
ggplot(dat3, aes(y=fit, x=x)) +
geom_ribbon(aes(ymin=lwr, ymax=upr, fill=Part), alpha=0.3, show.legend=FALSE) +
geom_line(aes(color=Part), show.legend=FALSE) +
geom_point(data=dat2, aes(y=y, x=x))
Hmmm.. Although this does a reasonable job of fitting the data, there is an issue. There is an abrupt change between the two windows. In this case, the two lines do line up reasonably well, but this need not be the case. If the split of the windows happened to coincide with an abrupt change in the data cloud, it is likely that the two lines will be displaced substantially (leading to two different predictions at the same value of the covariate).
More obvious in this case is the very abrupt change in trend trajectory (and uncertainty) at the boundary of the two windows. What we need to do is fit the piecewise cubic polynomials in a way that ensures that ends of the lines 1) meet up and 2) have the same trajectory. In essence, we need to be sure that the first and second derivatives at the point where two lines meet (breakpoints) are the consistent.
This is exactly what splines achieve.
Decompose splines into a set of basis functions.
Beta-spline basis matrices for polynomial splines can be easily created with the bs()
function. We simply provide the continuous predictor variable (in this case x
) along with the internal knots (boundary knots are included by default) and the polynomial order (e.g. 1, 2 or 3 etc - 3 is default for cubic spline). In this case, we will nominate a single internal knot (out off-center point).
A 3rd order polynomial spline with one internal knot will decompose into a matrix with four variables (representing the four basis functions). We can also visualize this set of basis functions.
=mid
k= with(dat2, data.frame(x=x, bs(x, knots=k, degree = 3))) %>%
basis gather(key=Basis, value=Value, -x)
%>% ggplot(aes(y=Value, x=x, color=Basis)) + geom_line() basis
The basis matrix forms a model matrix that can be applied to a linear model,
\[ g(E(y)) = \beta_0 + \sum^k_{j=1} b_j(x_1)\beta_j \]
where \(b_j(x_1)\) represent the \(j\) columns of the basis matrix and \(\beta_j\) represent the contribution of each basis function. \(\beta_j\) can estimated via a linear model (along with the intercept).
=mid
k= lm(y~bs(x, knots=k, degree=3), data=dat2)
dat2.lm4 summary(dat2.lm4)
##
## Call:
## lm(formula = y ~ bs(x, knots = k, degree = 3), data = dat2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.89902 -0.27432 -0.03338 0.38772 1.06711
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.7101 0.2996 5.708 8.45e-07 ***
## bs(x, knots = k, degree = 3)1 2.9489 0.5996 4.918 1.21e-05 ***
## bs(x, knots = k, degree = 3)2 -7.4907 0.5150 -14.544 < 2e-16 ***
## bs(x, knots = k, degree = 3)3 -0.1992 0.5621 -0.354 0.725
## bs(x, knots = k, degree = 3)4 -4.6648 0.3916 -11.911 1.65e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4768 on 45 degrees of freedom
## Multiple R-squared: 0.925, Adjusted R-squared: 0.9183
## F-statistic: 138.7 on 4 and 45 DF, p-value: < 2.2e-16
So having estimated the intercept (\(\beta_0\)) and each of the basis parameters (\(\beta_j\)), we can multiply each basis function by its respective parameter so as to see the contribution of each basis function (colored lines) to the overal model. When doing so, we should also include the intercept (V1 in the graph below). If we then sum up each of the contributions (including the intercept), we will generate the model fit (black line).
= cbind(1,bs(dat2$x, knots=k, degree=3))
basis = sweep(basis, MARGIN=2, coef(dat2.lm4), '*')
basis #basis = basis + coef(dat2.lm4)[1]
= rowSums(basis)
fit = basis %>% as.data.frame %>%
basis gather(key=Basis, value=Value) %>%
cbind(x=dat2$x)
ggplot() + geom_line(data=basis, aes(y=Value, x=x,color=Basis)) +
geom_line(data=NULL, aes(y=fit, x=dat2$x)) +
geom_point(data=dat2, aes(y=y, x=x))
The full prediction from the fitted model could have been yielded more simply by:
ggpredict(dat2.lm4, ~x) %>% plot(rawdata=TRUE)
## Warning: Some model terms could not be found in model data. You probably need to load the
## data into the environment.
## Warning: Some model terms could not be found in model data. You probably need to load the
## data into the environment.
That looks ok, but what if we selected more knots? We will try 10 and 20 evenly spaced internal knots.
=with(dat2, seq(min(x), max(x), len=12))
k=k[c(-1,-length(k))]
k= lm(y~bs(x, knots=k, degree=3), data=dat2)
dat2.lm5
=with(dat2, seq(min(x), max(x), len=22))
k=k[c(-1,-length(k))]
k= lm(y~bs(x, knots=k, degree=3), data=dat2)
dat2.lm6
grid.arrange(
ggpredict(dat2.lm5, ~x) %>% plot(rawdata=TRUE),
ggpredict(dat2.lm6, ~x) %>% plot(rawdata=TRUE),
nrow=1)
## Warning: Some model terms could not be found in model data. You probably need to load the
## data into the environment.
## Warning: Some model terms could not be found in model data. You probably need to load the
## data into the environment.
## Warning: Some model terms could not be found in model data. You probably need to load the
## data into the environment.
## Warning: Some model terms could not be found in model data. You probably need to load the
## data into the environment.
It may be that we have too many knots and are starting to force the model to learn about features that are specific to the sample rather than purely about the underlying process. This is referred to as overfitting. In this context, the spline is under smoothed (or too wiggly).
Of course in this case we have the luxury of knowing the true underlying process. Normally we would not have this oversight, so how can we select the most appropriate number of internal knots (and thus degree of smoothing).
Essentially, we are seeking a degree of smoothing (number of knots) that is a balance between:
Models are naturally geared to estimate parameters in a manner that minimize the residuals (or maximize likelihood), so to counter this, we need to penalize for wiggliness. If we impose a smoothing penalty on the basis functions, we can attempt to strike the above balance by estimating the optimal smoothing parameter. This can be achieved by either:
General Additive Models (GAM) via the mgcv
package help us achieve this.
A simple (generalized) linear models, model a response variable against a linear predictor (linear combination of predictor variables) according to the familiar form:
\[ g(E(y)) = \beta_0 + x_1\beta_1 \]
where \(E(y)\) is the expected value of y (which is dependent on a nominated family), \(g()\) represents the link function that links the expected values of y back to the nominated family, \(\beta_0\) is the y-intercept and \(\beta_i\) is the slope associated with the predictor (\(x_1\)).
In a GAM, we can add a smoothing term:
\[ g(E(y)) = \beta_0 + f_1(x_1) \]
where \(f_1(x_1)\) represents a smooth term. It actually represents a set of basis functions.
\[ f_1(x_1) = \sum^k_{j=1}{b_j(x_1)\beta_j} \]
That is, \(f_1(x_1)\) is the sum of \(b_j(x_1)\) basis functions of \(x_1\) multiplied by a set of parameters that determine the contribution of each basis functon. The basis functions are determined by 1) the data (\(x_1\)) and the form of basis function selected (e.g. penalized splines, cubic regression splines etc). The \(\beta_j\) parameters are estimated during the modelling process.
We are going to attempt to fit a smoother (a cubic regression spline to start with) to the samples. However, to get a better appreciation for how GAMs work, lets visualize what the cubic regression splines look like. This can be performed with the help of the smoothCon()
function in the mgcv
package.
smoothCon()
function to construct a set of cubic regression spline functions for our predictor variable (x
).X
) from the resulting list. This matrix will be a (\(n\times k\)) matrix, where \(n\) is the number of x
values and \(k\) is the number of basis functions (for a cubic regression splines, there are 10 basis functions by default).x
) and reshape the result into a long data frame to facilitate plotting.= smoothCon(s(x, bs='cr'), data=dat2)
bs =bs[[1]]$X
B= cbind(x=dat2$x, B) %>%
B %>%
as.data.frame gather(key=Basis, value=y, -x) %>%
mutate(Basis = factor(Basis, levels=unique(Basis)))
We can now plot these 10 basis functions.
ggplot(B, aes(y=y, x=x)) + geom_line() + facet_wrap(~Basis)
For a cubic regression spline, each basis function is the same, just shifted along the x-axis. The peak of each function corresponds to each of the 10 knots (which in this case are evenly spaced along the axis, because the x
data are uniformly distributed along the axis.
We could alternatively plot each of the basis function on the same graph (with different colors).
ggplot(B, aes(y=y, x=x)) + geom_line(aes(color=Basis))
When we then fit the GAM with cubic regression spline, we can fix the number of knots (functions) to a set number. Normally we would not want to do this. but it can be useful fo the purpose of illustration. Lets start by fixing the number of knots to be 10 (since this is what we have been visualizing).
To do so, we will use the gam()
function within the mgcv
package. The gam
package generates a model matrix for the smoothing term (in a similar way that we did above with the smoothCon()
function) and then uses this as the linear predictor in a GLM. Indeed, we can (and will) do this for ourselves.
However, before we do, we need to make an adjustment. While the basis funtions (and thus the model matrix) we created earlier are identifiable amongst themselves, they will not orthogonal (independent or distinguishable) to the intercept (which is always included in the linear model). Hence, they should be reparameterized such that each smooth term is constrained to sum-to-zero (\(\sum f(x_1) =0\)). That way, each term is orthogonal to the intercept.
We will therefore regenerate the smooth, this time indicating that the identifiability constraints should be absorbed into the basis to ensure full orthogonality.
= smoothCon(s(x, bs='cr'), absorb.cons=TRUE, data=dat2)
bs = glm(y ~ bs[[1]]$X, data=dat2)
dat2.glm coef(dat2.glm)
## (Intercept) bs[[1]]$X1 bs[[1]]$X2 bs[[1]]$X3 bs[[1]]$X4 bs[[1]]$X5
## -0.131671794 1.932725294 1.402661653 0.006710458 -1.880284432 -2.044086090
## bs[[1]]$X6 bs[[1]]$X7 bs[[1]]$X8 bs[[1]]$X9
## -1.239461463 -1.374067840 -1.828866369 -2.927270326
Now, we will use the gam()
function to perform the same analysis (remember, we are going to fix the number of basis functions \(k\) to be 10).
= gam(y ~ s(x, bs='cr', k=10, fx=TRUE), data=dat2)
dat2.gam = coef(dat2.gam)
coefs coefs
## (Intercept) s(x).1 s(x).2 s(x).3 s(x).4 s(x).5
## -0.131671794 1.932725294 1.402661653 0.006710458 -1.880284432 -2.044086090
## s(x).6 s(x).7 s(x).8 s(x).9
## -1.239461463 -1.374067840 -1.828866369 -2.927270326
The estimated coeficients represent the \(\beta_j\) parameters.
We could now scale the 10 cubic regression spline basis functions by multiplying by their corresponding \(\beta\) coefficient. Then, for each value of \(x\), we could sum up the 10 functions to get the overal spline. In R, the sweep()
function is used to multiply the columns (MARGIN=2) of each row of a matrix (in this case the matrix representing the basis functions) by a vector (in this case the vector of \(\beta\) parameters). The rowSums
function can then be used to sum each row (each row represents a x
value).
We will plot the three basis functions (colored lines), the sum of these functions (the fitted GAM smoother - black), the true sine wave and the original samples (solid dots).
= data.frame(x=seq(0,3*pi, len=100))
newdata = predict(dat2.gam, type='lpmatrix', newdata=newdata)
Xmat = sweep(Xmat,MARGIN=2,coefs,'*')
fits = newdata %>% mutate(fit=rowSums(fits))
newdata %>% as.data.frame %>%
fits bind_cols(newdata) %>%
gather(key=Basis, value=y, -x,-fit) %>%
ggplot(aes(y=y, x=x)) + geom_line(aes(color=Basis)) +
geom_line(aes(y=fit)) +
geom_point(data=dat2) +
geom_line(data=dat2, aes(y=yp), linetype='dashed')
The trend (black line) appears to be overfit (not smooth enough, too wiggly). That is, the trend seems to be starting to focus on the more stoichastic elements of the sample rather than the pure underlying trend. What if instead we ask for 10 knots instead of three.
= gam(y ~ s(x, bs='cr', k=3, fx=TRUE), data=dat2)
dat2.gam = coef(dat2.gam)
coefs = data.frame(x=seq(0,3*pi, len=100))
newdata = predict(dat2.gam, type='lpmatrix', newdata=newdata)
Xmat = sweep(Xmat,MARGIN=2,coefs,'*')
fits = newdata %>% mutate(fit=rowSums(fits))
newdata %>% as.data.frame %>%
fits bind_cols(newdata) %>%
gather(key=Basis, value=y, -x,-fit) %>%
ggplot(aes(y=y, x=x)) + geom_line(aes(color=Basis)) +
geom_line(aes(y=fit)) +
geom_point(data=dat2) +
geom_line(data=dat2, aes(y=yp), linetype='dashed')
This is not a particularly good match either - the trend now appears to be too smooth.
If the number of basis functions (knots) are not fixed, the gam()
function will attempt to estimate the ‘optimum’ degree of knots (and thus smoothing). This is essentially a balance between bias and variance.
It does this via a smoothing penalty that penalises for wiggliness by maximising the penalized likelihood function:
\[ 2L(\boldsymbol{\beta}) - \boldsymbol{\lambda} (\boldsymbol{\beta^T}\boldsymbol{S}\boldsymbol{\beta}) \]
where \(\boldsymbol{S}\) is a matrix representing the wiggliness of basis functions (typically from the squared second order derivatives of the basis functions) to form the penalty matrix (\(\boldsymbol{\beta^T}\boldsymbol{S}\boldsymbol{\beta}\)) and \(\boldsymbol{\lambda}\) is the smoothing parameter that controls the trade off between bias and variance.
The higher the smoothing parameter (\(\boldsymbol{\lambda}\)), the smoother (flatter) the trend. As \(\lambda\) approaches \(\infty\), the trend approaches a regular linear line. A \(\lambda\) of 0 results in the equivalent of fixing the number of knots as above.
= gam(y ~ s(x, bs='cr'), data=dat2, method='REML')
dat2.gam = coef(dat2.gam)
coefs = data.frame(x=seq(0,3*pi, len=100))
newdata = predict(dat2.gam, type='lpmatrix', newdata=newdata)
Xmat = sweep(Xmat,MARGIN=2,coefs,'*')
fits = newdata %>% mutate(fit=rowSums(fits))
newdata %>% as.data.frame %>%
fits bind_cols(newdata) %>%
gather(key=Basis, value=y, -x,-fit) %>%
ggplot(aes(y=y, x=x)) + geom_line(aes(color=Basis)) +
geom_line(aes(y=fit)) +
geom_point(data=dat2) +
geom_line(data=dat2, aes(y=yp), linetype='dashed')
That’s better.
However, before getting too excited, (just like all analysis) we should check the various model diagnostics so as to validate the model.
Firstly, we should establish whether the maximum number of knots (basis functions) is enough or whether there is any evidence that the smoother may be under fit. This is done via the k.check()
function.
k.check(dat2.gam)
## k' edf k-index p-value
## s(x) 9 6.768293 1.165484 0.8225
For each smooth term (in this case there is only one), it lists:
k'
- the maximum number of knots. In this case there are 9 - there was a total of 10, however one is used for the interceptedf
- the estimated degrees of freedom. This is a measure of the complexity of the smooth terms. It is the number of knots minus any penalized likelihood constraints (and is typically not an integer).k-index
- is a measure of the amount of pattern left in the residuals. Technically, it is the difference between neighbouring residuals standardized by the residual variance. The lower the k-index
below 1, the greater the chances are that there is still some pattern remaining in the residuals (and thus the term is over smooth).p-value
- proportion of k-index
values from repeated randomization of neighbouring residuals that are less than the actual k-index
If k-index
is less than 1, the p-value
is low and edf
is similar to k'
, there is evidence that the initial number of knots (\(k\)) was too low. It is recommended that the number of knots be doubled and the model refit.
In the current case, there is no issue with \(k\).
Now we should turn our attention to diagnostics about the actual fit.
Note, in all cases, deviance residuals are used.
::appraise(dat2.gam) gratia
In this case, all diagnostics appear reasonable.
In multiple linear models, (multi)colliniarity arises when one or more of the continuous predictors are correlated to the other predictor(s). When this is the case, the offending predictors compete within the model and the individual partial slope estimates can be either under or overestimated (slopes may even change polarity). In GAMs, the nonlinear equivalent is called concurvity. Concurvity measures how well each smooth could be approximated by either a combination of the other smooth(s) or each individual other smooth.
concurvity(dat2.gam)
## para s(x)
## worst 3.155444e-31 1.117389e-31
## observed 3.155444e-31 3.459132e-34
## estimate 3.155444e-31 6.755856e-33
concurvity(dat2.gam, full=FALSE)
## $worst
## para s(x)
## para 1.000000e+00 1.117389e-31
## s(x) 3.155444e-31 1.000000e+00
##
## $observed
## para s(x)
## para 1.000000e+00 3.459132e-34
## s(x) 3.155444e-31 1.000000e+00
##
## $estimate
## para s(x)
## para 1.000000e+00 6.755856e-33
## s(x) 3.155444e-31 1.000000e+00
In this case, we only have a single parametric term (the intercept) and a single smooth term. And since the smoother is always constructed to be orthogonal to the intercept, there should not be an issue - this is confirmed in the output above.
Before looking at the estimated model parameters, it is often useful to have a look at a partial effects plot to get a feel for the model. As with other models, it can really help in interpreting the model output(s).
::draw(dat2.gam) gratia
Note that the plot is always zero centered on the y-axis. Recall that all smoothers are sum-to-zero. Hence, the partial effects plot represents the trends not the absolute values. In this particular case, our actual response was approximately zero centered anyway (since it was fabricated from a sine wave that oscillates from -1 to 1).
Now we can explore the estimated parameters..
summary(dat2.gam)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(x, bs = "cr")
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.13167 0.06474 -2.034 0.0483 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(x) 6.768 7.862 76.61 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.925 Deviance explained = 93.5%
## -REML = 42.322 Scale est. = 0.20954 n = 50
In this case:
= with(dat2, list(x=seq(min(x), max(x), len=100)))
data.list = emmeans(dat2.gam, ~x, at=data.list) %>% as.data.frame
newdata %>%
newdata ggplot(aes(y=emmean, x=x)) +
geom_ribbon(aes(ymin=lower.CL, ymax=upper.CL), fill='blue', color=NA, alpha=0.3) +
geom_line() +
geom_point(data=dat2, aes(y=y)) +
theme_bw()
We can also perform all the necessary underlying calculations manually.
= with(dat2, data.frame(x=seq(min(x), max(x), len=100)))
newdata = predict(dat2.gam, newdata=newdata, type='lpmatrix')
Xmat = coef(dat2.gam)
coefs = as.vector(coefs %*% t(Xmat))
Fit = sqrt(diag(Xmat %*% vcov(dat2.gam) %*% t(Xmat)))
SE = qt(0.975, df=df.residual(dat2.gam))
Q = newdata %>% mutate(fit = Fit,
newdata lower = Fit-Q*SE,
upper = Fit+Q*SE)
%>%
newdata ggplot(aes(y=Fit, x=x)) +
geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue', color=NA, alpha=0.3) +
geom_line() +
geom_point(data=dat2, aes(y=y)) +
theme_bw()
mgcv
Basis function | Name | Properties |
---|---|---|
tp |
Thin plate spline | Low rank (far fewer parameters than data), isotropic (equal smoothing in any direction) regression splines |
ts |
Thin plate spline | Thin plate spline with penalties on the null space such that it can be shrunk to zero. Useful in combination with by= as this does not penalize the null space. |
cr |
Cubic regression spline | Splines with knots spread evenly throughout the covariate domain |
cc |
Cyclic cubic regression spline | Cubic regression splines with ends that meet (have the same second order derivatives) |
ps |
Penalized B-spline | Allow the distribution of knots to be based on data distribution |
cp |
Cyclic penalized B-spline | Penalized B-splines with ends that meet |
gp |
Gaussian process | Gaussian process smooths with five sets of correlation structures |
mrf() |
Markov random fields | Useful for modelling of discrete space. Uses penalties based on neighbourhood matrices (pairwise distances between discrete locations) |
re |
Random effects | Parametric terms penalized via identity matrices. Equivalent to i.i.d in mixed effects models |
fs |
Factor smooth interaction | For single dimension smoothers. Duplicates the basis functions for each level of the categorical covariate, yet uses a single smoothing parameter across all. |
sos |
Splines on the sphere | Isotropic 2D splines in spherical space. Useful for large spatial domains. |
so |
Soap film smooths | Smooths within polygon boundaries. These are useful for modelling complex spatial areas. |
Smoother definition functions | Type | Properties |
---|---|---|
s() |
General spline smoothers | - for multidimensional smooths, assumes that each component are on the same scale as there is only a single smoothing parameter for the smooth |
te() |
Tensor product smooth | - smooth functions of numerous covariates that are built as the tensor product of the comprising smooths (and penalties) |
- the interaction between numerous terms, each with their own smoothing parameter that penalizes the average wiggliness of that term. | ||
ti() |
Tensor product interaction smooth | - smooths that include only the highest order interactions (exclude the basis functions associated with the main effects of the comprising smooths) |
t2() |
Alternative tensor product smooth with non-overlapping … | - creates basis functions and penalties for paired combinations of separate penalized and unpenalized components of each term |