library(tidyverse) #for data wrangling and plotting
library(DHARMa) #for simulated residuals
library(performance) #for model diagnostics
library(see) #for model diagnostics
library(brms) #for Bayesian models
library(tidybayes) #for exploring Bayesian PKPDmodels
library(rstan) #for diagnostics plots
library(bayesplot) #for diagnostic plots
library(patchwork) #for arranging multiple plots
library(gridGraphics)#for arranging multiple plots - needed for some patchwork plots
library(HDInterval) #for HPD intervals
library(bayestestR) #for ROPE
library(emmeans) #for estimated marginal means
library(standist) #for plotting distributions
library(cmdstanr) #for the backend
source("helperfunctions.R")Bayesian generalised linear models
1 Preparations
Load the necessary libraries
Many biologists and ecologists get a little twitchy and nervous around mathematical and statistical formulae and nomenclature. Whilst it is possible to perform basic statistics without too much regard for the actual equation (model) being employed, as the complexity of the analysis increases, the need to understand the underlying model becomes increasingly important. Moreover, model specification in BUGS (the language used to program Bayesian modelling) aligns very closely to the underlying formulae. Hence a good understanding of the underlying model is vital to be able to create a sensible Bayesian model. Consequently, I will always present the linear model formulae along with the analysis. If you start to feel some form of disorder starting to develop, you might like to run through the Tutorials and Workshops twice (the first time ignoring the formulae).
This tutorial will introduce the concept of Bayesian (generalised) linear models and demonstrate how to fit simple models to a set of simple fabricated data sets, each representing major data types encountered in ecological research. Subsequent tutorials will build on these fundamentals with increasingly more complex data and models.
2 A philosophical note
To introduce the philosophical and mathematical differences between classical (frequentist) and Bayesian statistics, Wade (2000) presented a provocative yet compelling trend analysis of two hypothetical populations. The temporal trend of one of the populations shows very little variability from a very subtle linear decline. By contrast, the second population appears to decline more dramatically, yet has substantially more variability.
Wade (2000) neatly illustrates the contrasting conclusions (particularly with respect to interpreting probability) that would be drawn by the frequentist and Bayesian approaches and in so doing highlights how and why the Bayesian approach provides outcomes that are more aligned with management requirements.
This tutorial will start by replicating the demonstration of Wade (2000).
n: 10
Slope: -0.1022
t: -2.3252
p-value: 0.0485
n: 10
Slope: -10.2318
t: -2.2115
p-value: 0.0579
n: 100
Slope: -10.4713
t: -6.6457
p-value: 0
From a traditional frequentist perspective, we would conclude that there is a ‘significant’ relationship in Population A and C (\(p<0.05\)), yet not in Population B (\(p>0.05\)). Note, Population B and C were both generated from the same random distribution, it is just that Population C has a substantially higher number of observations.
The above illustrates a couple of things
statistical significance does not necessarily translate into biological importance. The percentage of decline for Population A is 0.46 where as the percentage of decline for Population B is 45.26. That is Population B is declining at nearly 10 times the rate of Population A. That sounds rather important, yet on the basis of the hypothesis test, we would dismiss the decline in Population B.
that a p-value is just the probability of detecting an effect or relationship - what is the probability that the sample size is large enough to pick up a difference.
Let us now look at it from a Bayesian perspective. I will just provide the posterior distributions (densities scaled to 0-1 so that they can be plotted together) for the slope for each population.
Focusing on Populations A and B, we would conclude:
the mean (plus or minus CI) slopes for Population A and B are -0.1 (-0.22,0.01) and -10.4 (-21.35,-0.04) respectively.
the Bayesian approach allows us to query the posterior distribution is many other ways in order to ask sensible biological questions. For example, we might consider that a rate of change of 5% or greater represents an important biological impact. For Population A and B, the probability that the rate is 5% or greater is 0 and 0.85 respectively.
3 Review of (generalised) linear models
I would highly recommend reviewing the information in the tutorial on generalised linear models, particularly the sections describe linear models, assumption checking and generalised linear models (GLM). Whilst there are philosophical differences between frequentist and Bayesian statistics that have implications for how models are fit and interpreted, model choice and assumption checking principles are common between the two approaches. Hence, many of these topics will be assumed, and not fully described in the current tutorial.
Recall from the tutorial on generalised linear models that simple linear regression is a linear modelling process that models a single response variable against one or more predictors with a linear combination of coefficients and (in the case of a Gaussian model) can be expressed as:
\[y_i = \beta_0+ \beta_1 x_i+\epsilon_i \hspace{1cm}\epsilon\sim{}N(0,\sigma^2)\]
where:
\(y_i\) is the response value for each of the \(i\) observations
\(\beta_0\) is the y-intercept (value of \(y\) when \(x=0\))
\(\beta_1\) is the slope (rate of chance in \(y\) per unit chance in \(x\))
\(x_i\) is the predictor value for each of the \(i\) observations
\(\epsilon_i\) is the residual value of each of the \(i\) observations. A residual is the difference between the observed value and the value expected by the model.
\(\epsilon\sim{}N(0,\sigma^2)\) indicates that the residuals are normally distributed with a constant amount of variance
The above can be re-expressed and generalised as:
\[ \begin{align} y_i&\sim{}Dist(\mu_i, ...) \\ g(\mu_i) &= \beta_0+ \beta_1 x_i \end{align} \]
where:
- \(Dist\) represents a distribution from the exponential family (such as Gaussian, Poisson, Binomial, etc)
- \(...\) represents additional parameters relevant to the nominated distribution (such as \(\sigma^2\): Gaussian, \(n\): Binomial and \(\phi\): Negative Binomial, etc)
- \(g()\) represents the link function (e.g. log: Poisson, logit: Binomial, etc)
The reliability of any model depends on the degree to which the data adheres to the model assumptions. Hence, as with frequentist models, exploratory data analysis (EDA) is a vital component of Bayesian modelling and since the model structures are similar between frequentist and Bayesian approaches, so too is EDA.
4 Bayesian (generalised) linear models
For the purpose of introduction, we will start by exploring a Gaussian model with a very simple fabricated data set representing the relationship between a response (\(y\)) and a continuous predictor (\(x = [1,2,3,4,5,6,7,8,9,10]\). The fabricated data set will comprise 10 observations each drawn from normal distributions with a set standard deviation of 4. The means of the 10 populations will be determined by the following equation:
\[ \mu_i = 2 + 5\times x_i \]
Let generate these data.
set.seed(234)
dat <- data.frame(x = 1:10) |>
mutate(y = round(rnorm(n = 10, mean = 2 + (5 * x), sd = 4), digits = 2))
dat x y
1 1 9.64
2 2 3.79
3 3 11.00
4 4 27.88
5 5 32.84
6 6 32.56
7 7 37.84
8 8 29.86
9 9 45.05
10 10 47.65
The model will will be fitting will be:
\[ \begin{align} y_i&\sim{}N(\mu_i, \sigma^2)\\ \mu_i &= \beta_0+ \beta_1 x_i \end{align} \]
The parameters that we are going to attempt to estimate are the y-intercept (\(\beta_0\)), the slope (\(\beta_1\)) and the underlying variance (\(\sigma^2\)). Recall (from tutorials on statistical philosophies and estimation)) that Bayesian models calculate posterior predictions (\(P(H|D)\)) from likelihood (\(P(D|H)\)) and prior expectations (\(P(H)\)). Therefore, in preparation for fitting a Bayesian model, we must consider what our prior expectations are for all parameters.
The individual responses (\(y_i\), observed yields) are each expected to have been independently drawn from normal (Gaussian) distributions (\(\mathcal{N}\)). These distributions represent all the possible values of \(y\) we could have obtained at the specific (\(i^{th}\)) level of \(x\). Hence the \(i^{th}\) \(y\) observation is expected to have been drawn from a normal distribution with a mean of \(\mu_i\).
Although each distribution is expected to come from populations that differ in their means, we assume that all of these distributions have the same variance (\(\sigma^2\)).
4.1 Priors
We need to supply priors for each of the parameters to be estimated (\(\beta_0\), \(\beta_1\) and \(\sigma\)). Whilst we want these priors to be sufficiently vague as to not influence the outcomes of the analysis (and thus be equivalent to the frequentist analysis), we do not want the priors to be so vague (wide) that they permit the MCMC sampler to drift off into parameter space that is both illogical as well as numerically awkward.
Proffering sensible priors is one of the most difficult aspects of performing Bayesian analyses. For instances where there are some previous knowledge available and a desire to incorporate those data, the difficulty is in how to ensure that the information is incorporated correctly. However, for instances where there are no previous relevant information and so a desire to have the posteriors driven entirely by the new data, the difficulty is in how to define priors that are both vague enough (not bias results in their direction) and yet not so vague as to allow the MCMC sampler to drift off into unsupported regions (and thus get stuck and yield spurious estimates).
For early implementations of MCMC sampling routines (such as Metropolis Hasting and Gibbs), it was fairly common to see very vague priors being defined. For example, the priors on effects, were typically normal priors with mean of 0 and variance of 1e+06 (1,00,000). These are very vague priors. Yet for some samplers (e.g. NUTS), such vague priors can encourage poor behaviour of the sampler - particularly if the posterior is complex. It is now generally advised that priors should (where possible) be somewhat weakly informative and to some extent, represent the bounds of what are feasible and sensible estimates.
The degree to which priors influence an outcome (whether by having a pulling effect on the estimates or by encouraging the sampler to drift off into unsupported regions of the posterior) is dependent on:
- the relative sparsity of the data - the larger the data, the less weight the priors have and thus less influence they exert.
- the complexity of the model (and thus posterior) - the more parameters, the more sensitive the sampler is to the priors.
The sampled posterior is the product of both the likelihood and the prior - all of which are multidimensional. For most applications, it would be vertically impossible to define a sensible multidimensional prior. Hence, our only option is to define priors on individual parameters (e.g. the intercept, slope(s), variance etc) and to hope that if they are individually sensible, they will remain collectively sensible.
So having (hopefully) impressed upon the notion that priors are an important consideration, I will now attempt to synthesise some of the approaches that can be employed to arrive at weakly informative priors that have been gleaned from various sources. Largely, this advice has come from the following resources:
- https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations
- http://svmiller.com/blog/2021/02/thinking-about-your-priors-bayesian-analysis/
I will outline some of the current main recommendations before summarising some approaches in a table.
- weakly informative priors should contain enough information so as to regularise (discourage unreasonable parameter estimates whilst allowing all reasonable estimates).
- for effects parameters on scaled (standardised) data, an argument could be made for a normal distribution with a standard deviation of 1 (e.g.
normal(0,1)), although some prefer a t distribution with 3 degrees of freedom and standard deviation of 1 (e.g.student_t(3,0,1)) - apparently a flatter t is a more robust prior than a normal as an uninformative prior… - for un-scaled data, the above priors can be scaled by using the standard deviation of the data as the prior standard deviation (e.g.
student_t(3,0,sd(y)), orsudent_t(3,0,sd(y)/sd(x))) - for priors of hierachical standard deviations, priors should encourage shrinkage towards 0 (particularly if the number of groups is small, since otherwise, the sampler will tend to be more responsive to “noise”).
In this tutorial series, we will perform Bayesian analysis in the STAN language via an R interface. Two popular interfaces that greatly simplify the specification of Bayesian models are brms and rstanarm. We will exclusively focus on the former as it is far more flexible.
| Family | Parameter | brms | rstanarm |
|---|---|---|---|
| Gaussian | Intercept | student_t(3,median(y),mad(y)) |
normal(mean(y),2.5*sd(y)) |
| ‘Population effects’ (slopes, betas) | flat, improper priors | normal(0,2.5*sd(y)/sd(x)) |
|
| Sigma | student_t(3,0,mad(y)) |
exponential(1/sd(y)) |
|
| ‘Group-level effects’ | student_t(3,0,mad(y)) |
decov(1,1,1,1) |
|
| Correlation on group-level effects | ljk_corr_cholesky(1) |
||
| Poisson | Intercept | student_t(3,median(y),mad(y)) |
normal(mean(y),2.5*sd(y)) |
| ‘Population effects’ (slopes, betas) | flat, improper priors | normal(0,2.5*sd(y)/sd(x)) |
|
| ‘Group-level effects’ | student_t(3,0,mad(y)) |
decov(1,1,1,1) |
|
| Correlation on group-level effects | ljk_corr_cholesky(1) |
||
| Negative binomial | Intercept | student_t(3,median(y),mad(y)) |
normal(mean(y),2.5*sd(y)) |
| ‘Population effects’ (slopes, betas) | flat, improper priors | normal(0,2.5*sd(y)/sd(x)) |
|
| Shape | gamma(0.01, 0.01) |
exponential(1/sd(y)) |
|
| ‘Group-level effects’ | student_t(3,0,mad(y)) |
decov(1,1,1,1) |
|
| Correlation on group-level effects | ljk_corr_cholesky(1) |
Notes:
brms
https://github.com/paul-buerkner/brms/blob/c2b24475d727c8afd8bfc95947c18793b8ce2892/R/priors.R
- In the above, for non-Gaussian families,
yis first transformed according to the family link. If the family link islog, then 0.1 is first added to 0 values. - in
brmsthe minimum standard deviation for the Intercept prior is2.5 - in
brmsthe minimum standard deviation for group-level priors is10.
rstanarm
http://mc-stan.org/rstanarm/articles/priors.html
- in
rstanarmpriors on standard deviation and correlation associated with group-level effects are packaged up into a single prior (decovwhich is a decomposition of the variance and covariance matrix).
In my experience, I find that the above priors tend to be a little bit too wide for many ecological applications and I often prefer to use 1.5 rather than 2.5 as the multiplier.
In Bayesian models, centering of predictors offers huge numerical advantages. So important is it to center that brms automatically centers any continuous predictors for you. However, since the user has not necessarily centered the predictors, the user might misinterpret the outputs from a brms model. Consequently, when fitting a model, brms also generates y-intercept values that are consistent with un-centered values and these are the estimates returned to the user.
Nevertheless, I would recommend that you should always explicitly center continuous predictors to provide more meaningful interpretations of the y-intercept. I would also highly recommend standardising continuous predictors - this will not only help speed up and stabilise the model, it will simplify the spefication of priors - see the specific examples later in this tutorial.
Based on the above, for our fabricated data, lets assign the following priors:
- \(\beta_0\): Normal prior centred at 31.21 with a variance of 15.17
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 4.09
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 15.17
variance:
Note, again, when fitting models through either rstanarm or brms, the priors assume that the predictor(s) have been centred and are to be applied on the link scale. In this case the link scale is an identity.
Similar logic can be applied for models that employ different distributions. In the following sections, we will define numerous sets of data (each of which represents different major forms of ecological data) and see how we can set appropriate priors in each case. In working through these example, it is worth reflecting on how much simpler prior specification is if we use standardised predictors.
5 Example data
This tutorial will blend theoretical discussions with actual calculations and model fits. I believe that by bridging the divide between theory and application, we all gain better understanding. The applied components of this tutorial will be motivated by numerous fabricated data sets. The advantage of simulated data over real data is that with simulated data, we know the ‘truth’ and can therefore gauge the accuracy of estimates.
The motivating examples are:
- Example 1 - simulated samples drawn from a Gaussian (normal) distribution reminiscent of data collected on measurements (such as body mass)
- Example 2 - simulated Gaussian samples drawn three different populations representing three different treatment levels (e.g. body masses of three different species)
- Example 3 - simulated samples drawn from a Poisson distribution reminiscent of count data (such as number of individuals of a species within quadrats)
- Example 4 - simulated samples drawn from a Negative Binomial distribution reminiscent of over-dispersed count data (such as number of individuals of a species that tends to aggregate in groups)
- Example 5 - simulated samples drawn from a Bernoulli (binomial with \(n = 1\)) distribution reminiscent of binary data (such as the presence/absence of a species within sites)
- Example 6 - simulated samples drawn from a Binomial distribution reminiscent of proportional data (such as counts of a particular taxa out of a total number of individuals)
Lets formally simulate the data illustrated above. The underlying process dictates that on average a one unit change in the predictor (x) will be associated with a five unit change in response (y) and when the predictor has a value of 0, the response will typically be 2. Hence, the response (y) will be related to the predictor (x) via the following:
\[ y = 2 + 5x \]
This is a deterministic model, it has no uncertainty. In order to simulate actual data, we need to add some random noise. We will assume that the residuals are drawn from a Gaussian distribution with a mean of zero and standard deviation of 4. The predictor will comprise of 10 uniformly distributed integer values between 1 and 10. We will round the response to two decimal places.
For repeatability, a seed will be employed on the random number generator. Note, the smaller the dataset, the less it is likely to represent the underlying deterministic equation, so we should keep this in mind when we look at how closely our estimated parameters approximate the ‘true’ values. Hence, the seed has been chosen to yield data that maintain a general trend that is consistent with the defining parameters.
set.seed(234)
dat <- data.frame(x = 1:10) |>
mutate(y = round(2 + 5*x + rnorm(n = 10, mean = 0, sd = 4), digits = 2))
dat x y
1 1 9.64
2 2 3.79
3 3 11.00
4 4 27.88
5 5 32.84
6 6 32.56
7 7 37.84
8 8 29.86
9 9 45.05
10 10 47.65
We will use these data in two ways. Firstly, to estimate the mean and variance of the reponse (y) ignoring the predcitor (x) and secondly to estimate the relationship between the reponse and predictor.
For the former, we know that the mean and variance of the response (y) can be calculated as:
\[ \begin{align} \bar{y} =& \frac{1}{n}\sum^n_{i=1}y_i\\ var(y) =& \frac{1}{n}\sum^n_{i=1}(y-\bar{y})^2\\ sd(y) =& \sqrt{var(y)} \end{align} \]
As previously described, categorical predictors are transformed into dummy codes prior to the fitting of the linear model. We will simulate a small data set with a single categorical predictor comprising a control and two treatment levels (‘mediam’, ‘high’). To simplify things we will assume a Gaussian distribution, however most of the modelling steps would be the same regardless of the chosen distribution.
The data will be drawn from three Gaussian distributions with a standard deviation of 4 and means of 20, 15 and 10. We will draw a total of 12 observations, four from each of the three populations.
set.seed(123)
beta_0 <- 20
beta <- c(-2, -10)
sigma <- 4
n <- 12
x <- gl(3, 4, 12, labels = c('control', 'medium', 'high'))
y <- (model.matrix(~x) %*% c(beta_0, beta)) + rnorm(12, 0, sigma)
dat2 <- data.frame(x = x, y = y)
dat2 x y
1 control 17.758097
2 control 19.079290
3 control 26.234833
4 control 20.282034
5 medium 18.517151
6 medium 24.860260
7 medium 19.843665
8 medium 12.939755
9 high 7.252589
10 high 8.217352
11 high 14.896327
12 high 11.439255
The Poisson distribution is only parameterized by a single parameter (\(\lambda\)) which represents both the mean and variance. Furthermore, Poisson data can only be positive integers.
Unlike simple trend between two Gaussian or uniform distributions, modelling against a Poisson distribution alters the scale to logarithms. This needs to be taken into account when we simulate the data. The parameters that we used to simulate the underlying processes need to either be on a logarithmic scale, or else converted to a logarithmic scale prior to using them for generating the random data.
Moreover, for any model that involves a non-identity link function (such as a logarithmic link function for Poisson models), ‘slope’ is only constant on the scale of the link function. When it is back transformed onto the natural scale (scale of the data), it takes on a different meaning and interpretation.
We will chose \(\beta_0\) to represent a value of 1 when x=0. As for the ‘effect’ of the predictor on the response, lets say that for every one unit increase in the predictor the response increases by 40% (on the natural scale). Hence, on the log scale, the slope will be \(log(1.5)=\) 0.3364722.
In theory, count data should follow a Poisson distribution and therefore have properties like mean equal to variance (e.g. \(\textnormal{Dispersion}=\frac{\sigma}{\mu}=1\)). However as simple linear models are low dimensional representations of a system, it is often unlikely that such a simple model can capture all the variability in the response (counts). For example, if we were modelling the abundance of a species of intertidal snail within quadrats in relation to water depth, it is highly likely that water depth alone drives snail abundance. There are countless other influences that the model has not accounted for. As a result, the observed data might be more variable than a Poisson (of a particular mean) would expect and in such cases, the model is over-dispersed (more variance than expected).
Over dispersed models under-estimate the variability and thus precision in estimates resulting in inflated confidence in outcomes (elevated Type I errors).
There are numerous causes of over-dispersed count data (one of which is eluded to above). These are:
- additional sources of variability not being accounted for in the model (see above)
- when the items being counted aggregate together. Although the underlying items may have been generated by a Poisson process, the items clump together. When the items are counted, they are more likely to be in either in relatively low or relatively high numbers - hence the data are more varied than would be expected from their overall mean.
- imperfect detection resulting in excessive zeros. Again the underlying items may have been generated by a Poisson process, however detecting and counting the items might not be completely straight forward (particularly for more cryptic items). Hence, the researcher may have recorded no individuals in a quadrat and yet there was one or more present, they were just not obvious and were not detected. That is, layered over the Poisson process is another process that determines the detectability. So while the Poisson might expect a certain proportion of zeros, the observed data might have a substantially higher proportion of zeros - and thus higher variance.
This example will generate data that is drawn from a negative binomial distribution so as to broadly represent any one of the above causes.
We will chose \(\beta_0\) to represent a value of 1 when x=0. As for the ‘effect’ of the predictor on the response, lets say that for every one unit increase in the predictor the response increases by 40% (on the natural scale). Hence, on the log scale, the slope will be \(log(1.5)=\) 0.3364722. Finally, the dispersion parameter (ratio of variance to mean) will be 10.
set.seed(234)
beta <- c(1, 1.40)
beta <- log(beta)
n <- 10
size <- 10
dat4 <- data.frame(x = seq(from = 1, to = 10, len = n)) |>
mutate(
mu = exp(beta[1] + beta[2] * x),
y = rnbinom(n, size = size, mu = mu)
)
dat4 x mu y
1 1 1.400000 0
2 2 1.960000 3
3 3 2.744000 7
4 4 3.841600 3
5 5 5.378240 5
6 6 7.529536 9
7 7 10.541350 13
8 8 14.757891 10
9 9 20.661047 17
10 10 28.925465 26
Binary data (presence/absence, dead/alive, yes/no, heads/tails, etc) pose unique challenges for linear modeling. Linear regression, designed for continuous outcomes, may not be directly applicable to binary responses. The nature of binary data violates assumptions of normality and homoscedasticity, which are fundamental to linear regression. Furthermore, linear models may predict probabilities outside the [0, 1] range, leading to unrealistic predictions.
This example will generate data that is drawn from a bernoulli distribution so as to broadly represent presence/absence data.
We will chose \(\beta_0\) to represent the odds of a value of 1 when \(x=0\) equal to \(0.02\). This is equivalent to a probability of \(y\) being zero when \(x=0\) of \(\frac{0.02}{1+0.02}=0.0196\). E.g., at low \(x\), the response is likely to be close to 0. For every one unit increase in \(x\), we will stipulate a 2 times increase in odds that the expected response is equal to 1.
Similar to binary data, proportional (binomial) data tend to violate normality and homogeneity of variance (particularly as mean proportions approach either 0% or 100%.
This example will generate data that is drawn from a binomial distribution so as to broadly represent proportion data.
We will chose \(\beta_0\) to represent the odds of a particular trial (e.g. an individual) being of a particular type (e.g. species 1) when \(x=0\) and to equal to \(0.02\). This is equivalent to a probability of \(y\) being of the focal type when \(x=0\) of \(\frac{0.02}{1+0.02}=0.0196\). E.g., at low \(x\), the the probability that an individual is taxa 1 is likely to be close to 0. For every one unit increase in \(x\), we will stipulate a 2.5 times increase in odds that the expected response is equal to 1.
For this example, we will also convert the counts into proportions (\(y2\)) by division with the number of trials (\(5\)).
set.seed(123)
beta <- c(0.02, 2.5)
beta <- log(beta)
n <- 10
trials <- 5
dat6 <- data.frame(x = seq(from = 1, to = 10, len = n)) |>
mutate(
count = as.numeric(rbinom(n, size = trials, prob = plogis(beta[1] + beta[2] * x))),
total = trials,
y = count/total
)
dat6 x count total y
1 1 0 5 0.0
2 2 1 5 0.2
3 3 1 5 0.2
4 4 4 5 0.8
5 5 2 5 0.4
6 6 5 5 1.0
7 7 5 5 1.0
8 8 4 5 0.8
9 9 5 5 1.0
10 10 5 5 1.0
6 Exploratory data analysis
Statistical models utilize data and the inherent statistical properties of distributions to discern patterns, relationships, and trends, enabling the extraction of meaningful insights, predictions, or inferences about the phenomena under investigation. To do so, statistical models make assumptions about the likely distributions from which the data were collected. Consequently, the reliability and validity of any statistical model depend upon adherence to these underlying assumptions.
Exploratory Data Analysis (EDA) and assumption checking therefore play pivotal roles in the process of statistical analysis, offering essential tools to glean insights, assess the reliability of statistical methods, and ensure the validity of conclusions drawn from data. EDA involves visually and statistically examining datasets to understand their underlying patterns, distributions, and potential outliers. These initial steps provides an intuitive understanding of the data’s structure and guides subsequent analyses. By scrutinizing assumptions, such as normality, homoscedasticity, and independence, researchers can identify potential limitations or violations that may impact the accuracy and reliability of their findings.
Exploratory Data Analysis within the context of ecological statistical models usually comprise a set of targetted graphical summaries. They are not to be considered definitive diagnostics of the model assumptions, but rather a first pass to assess the obvious violations prior to the fitting of models. More definitive diagnostics can only be achieved after a model has been fit.
In addition to graphical summaries, there are numerous statistical tests to help explore possible violations of various statistical assumptions. These tests are less commonly used in ecology since they are often more sensitive to deviations from ideal than are the models that we are seeking to ensure.
Simple classic regression models are often the easiest models to fit and interpret and as such often represent a standard by which other alternate models are gauged. As you will see later in this tutorial, such models can actually be fit using closed form (exact solution) matrix algebra that can be performed by hand. Nevertheless, and perhaps as a result, they also impose some of the strictest assumptions. Although these collective assumptions are specific to gaussian models, they do provide a good introduction to model assumptions in general, so we will use them to motivate the more wider discussion.
Simple (gaussian) linear models (represented below) make the following assumptions:
The data depicted above where generated using the following R codes:
The observations represent
- single observations drawn from 10 normal populations
- each population had a standard deviation of 4
- the mean of each population varied linearly according to the value of x (\(2 + 5x\))
- normality: the residuals (and thus observations) must be drawn from populations that are normal distribution. The right hand figure underlays the ficticious normally distributed populations from which the observed values have been sampled.
Estimation and inference testing in linear regression assumes that the response is normally distributed in each of the populations. In this case, the populations are all possible measurements that could be collected at each level of \(x\) - hence there are 16 populations. Typically however, we only collect a single observation from each population (as is also the case here). How then can be evaluate whether each of these populations are likely to have been normal?
For a given response, the population distributions should follow much the same distribution shapes. Therefore provided the single samples from each population are unbiased representations of those populations, a boxplot of all observations should reflect the population distributions.
The two figures above show the relationships between the individual population distributions and the overall distribution. The left hand figure shows a distribution drawn from single representatives of each of the 16 populations. Since the 16 individual populations were normally distributed, the distribution of the 16 observations is also normal.
By contrast, the right hand figure shows 16 log-normally distributed populations and the resulting distribution of 16 single observations drawn from these populations. The overall boxplot mirrors each of the individual population distributions.
Whilst traditionally, non-normal data would typically be normalised via a scale transformation (such as a logarithmic transformation), these days it is arguably more appropriate to attempt to match the data to a more suitable distribution (see later in this tutorial).
You may have noticed that we have only explored the distribution of the response (y-axis). What about the distribution of the predictor (independent, x-axis) variable, does it matter? The distribution assumption applies to the residuals (which as purely in the direction of the y-axis). Indeed technically, it is assumed that there is no uncertainty associated with the predictor variable. They are assumed to be set and thus there is no error associated with the values observed. Whilst this might not always be reasonable, it is an assumption.
Given that the predictor values are expected to be set rather than measured, we actually assume that they are uniformly distributed. In practice, the exact distribution of predictor values is not that important provided it is reasonably symmetrical and no outliers (unusually small or large values) are created as a result of the distribution.
As with exploring the distribution of the response variable, boxplots, histograms and density plots can be useful means of exploring the distribution of predictor variable(s). When such diagnostics reveal distributional issues, scale transformations (such as logarithmic transformations) are appropriate.
homogeneity of variance: the residuals (and thus observations) must be drawn from populations that are equally varied. The model as shown only estimates a single variance (\(\sigma^2\)) parameter - it is assumed that this is a good overall representation of all underlying populations. The right hand figure underlays the ficticious normally distributed and equally varied populations from which the observations have been sampled.
Moreover, since the expected values (obtained by solving the deterministic component of the model) and the variance must be estimated from the same data, they need to be independent (not related one another)
Simple linear regression also assumes that each of the populations are equally varied. Actually, it is the prospect of a relationship between the mean and variance of y-values across x-values that is of the greatest concern. Strictly the assumption is that the distribution of y values at each x value are equally varied and that there is no relationship between mean and variance.
However, as we only have a single y-value for each x-value, it is difficult to directly determine whether the assumption of homogeneity of variance is likely to have been violated (mean of one value is meaningless and variability can’t be assessed from a single value). The figure below depicts the ideal (and almost never realistic) situation in which (left hand figure) the populations are all equally varied. The middle figure simulates drawing a single observation from each of the populations. When the populations are equally varied, the spread of observed values around the trend line is fairly even - that is, there is no trend in the spread of values along the line.
If we then plot the residuals (difference between observed values and those predicted by the trendline) against the predict values, there is a definite lack of pattern. This lack of pattern is indicative of a lack of issues with homogeneity of variance.
If we now contrast the above to a situation where the population variance is related to the mean (unequal variance), we see that the observations drawn from these populations are not evenly distributed along the trendline (they get more spread out as the mean predicted value increase). This pattern is emphasized in the residual plot which displays a characteristic “wedge”-shape pattern.
Hence looking at the spread of values around a trendline on a scatterplot of \(y\) against \(x\) is a useful way of identifying gross violations of homogeneity of variance. Residual plots provide an even better diagnostic. The presence of a wedge shape is indicative that the population mean and variance are related.
- linearity: the underlying relationships must be simple linear trends, since the line of best fit through the data (of which the slope is estimated) is linear. The right hand figure depicts a linear trend through the underlying populations.
It is important to disclose the meaning of the word “linear” in the term “linear regression”. Technically, it refers to a linear combination of regression coefficients. For example, the following are examples of linear models:
- \(y_i = \beta_0 + \beta_1 x_i\)
- \(y_i = \beta_0 + \beta_1 x_i + \beta_2 z_i\)
- \(y_i = \beta_0 + \beta_1 x_i + \beta_2 x^2_i\)
All the coefficients (\(\beta_0\), \(\beta_1\), \(\beta_2\)) are linear terms. Note that the last of the above examples, is a linear model, however it describes a non-linear trend. Contrast the above models with the following non-linear model:
- \(y_i = \beta_0 + x_i^{\beta_1}\)
In that case, the coefficients are not linear combinations (one of them is a power term).
That said, a simple linear regression usually fits a straight (linear) line through the data. Therefore, prior to fitting such a model, it is necessary to establish whether this really is the most sensible way of describing the relationship. That is, does the relationship appear to be linearly related or could some other non-linear function describe the relationship better. Scatterplots and residual plots are useful diagnostics.
To see how a residual plot could be useful, consider the following. The first row of figures illustrate the residuals resulting from data drawn from a linear trend. The residuals are effectively random noise. By contrast, the second row show the residuals resulting from data drawn from a non-normal relationship that have nevertheless been modelled as a linear trend. There is still a clear pattern remaining in the residuals.
The above might be an obvious and somewhat overly contrived example, yet it does illustrate the point - that a pattern in the residuals could point to a mis-specified model.
If non-linearity does exist (as in the second case above) , then fitting a straight line through what is obviously not a straight relationship is likely to poorly represent the true nature of the relationship. There are numerous causes of non-linearity:
- underlying distributional issues can result in non-linearity. For example, if we are assuming a gaussian distribution and the data are non-normal, often the relationships will appear non-linear. Addressing the distributional issues can therefore resolve the linearity issues
- the underlying relationship might truly be non-linear in which case this should be reflected in some way by the model formula. If the model formula fails to describe the non-linear trend, then problems will persist.
- the model proposed is missing an important covariate that might help standardise the data in a way that results in linearity
independence: the residuals (and thus observations) must be independently drawn from the populations. That is, the correlation between all the observations is assumed to be 0 (off-diagonals in the covariance matrix). More practically, there should be no pattern to the correlations between observations.
Random sampling and random treatment assignment are experimental design elements that are intended to mitigate many types of sampling biases that cause dependencies between observations. Nevertheless, there are aspects of sampling designs that are either logistically difficult to randomise or in some cases not logically possible. For example, the residuals from observations sampled closer together in space and time will likely be more similar to one another than those of observations more spaced apart. Since neither space nor time can be randomised, data collected from sampling designs that involve sampling over space and/or time need to be assess for spatial and temporal dependencies. These concepts will be explored in the context of introducing susceptible designs in a later tutorial.
The above is only a very brief overview of the model assumptions that apply to just one specific model (simple linear gaussian regression). For the remainder of this section, we will graphically explore the two motivating example data sets so as gain insights into what distributional assumptions might be most valid, and thus help guide modelling choices. Similarly, for subsequent tutorials in this series (that introduce progressively more complex models), all associated assumptions will be explored and detailed.
Conclusions
- there are no obvious violations of the linear regression model assumptions
- we can now fit the suggested model
- full confirmation about the model’s goodness of fit should be reserved until after exploring the additional diagnostics that are only available after fitting the model.
Conclusions
- the spread of noise in each group seems reasonably similar
- more importantly, there does not seem to be a relationship between the mean (as approximated by the position of the boxplots along the y-axis) and the variance (as approximated by the spread of the boxplots).
- that is, the size of the boxplots do not vary with the elevation of the boxplots.
Linearity is not an issue for categorical predictors since it is effectively fitting separate lines between pairs of points (and a line between two points can only ever be linear)….
Conclusions
- no evidence of non-normality
- no evidence of non-homogeneity of variance
Conclusions
- the spread of noise does not look random along the line of best fit.
- homogeneity of variance is difficult to assess in the presence of distributional issues (such as non-normality in this case) as they can result in non-linearity (apparent here)
Conclusions
- the data do not appear to be linear
- the red line is a loess smoother and it is clear that the data are not linear
Conclusions
- there are obvious violations of the linear regression model assumptions
- we should consider a different model that does not assume normality
Conclusions
- the spread of noise does not look random along the line of best fit.
- homogeneity of variance is difficult to assess in the presence of distributional issues (such as non-normality in this case) as they can result in non-linearity (apparent here)
Conclusions
- the data do not appear to be linear
- the red line is a loess smoother and it is clear that the data are not linear
Conclusions
- there are obvious violations of the linear regression model assumptions
- we should consider a different model that does not assume normality
Conclusions
- the data are clearly not linear
- the red line is a loess smoother and it is clear that the data are not linear
Conclusions
- there are obvious violations of the linear regression model assumptions
- we should consider a different model that does not assume normality
Conclusions
- although there is no evidence of non-linearity from this small data set, it is worth noting that the line of best fit does extend outside the logical response range [0.1] within the range of observed \(x\) values. That is, a simple linear model would predict proportions higher than 100% at high values of \(x\)
- this is a common issue with binomial data and is often addressed by fitting a logistic regression model
Conclusions
- there are obvious violations of the linear regression model assumptions
- we should consider a different model that does not assume normality
7 Fitting models
One way to assess the priors is to have the MCMC sampler sample purely from the prior predictive distribution without conditioning on the observed data. Doing so provides a glimpse at the range of predictions possible under the priors. On the one hand, wide ranging predictions would ensure that the priors are unlikely to influence the actual predictions once they are conditioned on the data. On the other hand, if they are too wide, the sampler is being permitted to traverse into regions of parameter space that are not logically possible in the context of the actual underlying ecological context. Not only could this mean that illogical parameter estimates are possible, when the sampler is traversing regions of parameter space that are not supported by the actual data, the sampler can become unstable and have difficulty.
In brms, we can inform the sampler to draw from the prior predictive distribution instead of conditioning on the response, by running the model with the sample_prior = 'only' argument. Unfortunately, this cannot be applied when there are flat priors (since the posteriors will necessarily extend to negative and positive infinity). Therefore, in order to use this useful routine, we need to make sure that we have defined a proper prior for all parameters.
Earlier we suggested the following priors might be useful:
- \(\beta_0\): Normal prior centred at 31.21 with a variance of 15.17
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 4.09
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 15.17
variance:
It might be use usefull to understand what some of these distributions look like. For example, we have used both a normal (Gaussian) distribution and a flatter t distribution for y-intercept and slope respectively. This was a somewhat arbitrary choice. We could easily have gone with either normal or t distributions for all of the above parameters. To visualise prior distributions for the slope based on both normal and t distributions:
Evidently, the t distribution (with 3 degrees of freedom) is wider than the normal distribution. The former should be more robust to data with values that are less concentrated around the mean.
\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 x_i\\ \end{align} \]
- start by fitting the model and sampling from the priors only
Compiling Stan program...
Start sampling
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 7e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.012 seconds (Warm-up)
Chain 1: 0.012 seconds (Sampling)
Chain 1: 0.024 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 4e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.011 seconds (Warm-up)
Chain 2: 0.012 seconds (Sampling)
Chain 2: 0.023 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 4e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.012 seconds (Warm-up)
Chain 3: 0.013 seconds (Sampling)
Chain 3: 0.025 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 3e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.012 seconds (Warm-up)
Chain 4: 0.012 seconds (Sampling)
Chain 4: 0.024 seconds (Total)
Chain 4:
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ x
Data: dat (Number of observations: 10)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.27 0.35 -0.41 0.99 1.00 2634 2171
x -0.09 0.45 -0.99 0.81 1.00 2536 2195
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.08 0.32 0.65 1.88 1.00 2161 2142
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_x" "sigma" "prior_Intercept"
[5] "prior_b" "prior_sigma" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_x" "sigma" "prior_Intercept"
[5] "prior_b" "prior_sigma" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
Unless you explicitly direct brm to include a user-defined intercept, the priors on the default intercept should assume that the predictor(s) are centered (because brm will automatically center all continuous predictors).
Lets try the following priors:
- \(\beta_0\): Normal prior centred at 0.61 with a variance of 0.48
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centered at 0 with a variance of 0.75
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 0.48
variance:
\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 (x_i - \bar{x})\\ \end{align} \]
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "sigma"
[4] "prior_Intercept" "prior_b" "prior_sigma"
[7] "lprior" "lp__" "accept_stat__"
[10] "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "sigma"
[4] "prior_Intercept" "prior_b" "prior_sigma"
[7] "lprior" "lp__" "accept_stat__"
[10] "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
When the predictor is standardised, it simplifies prior definition because we no longer need to consider the scale of the predictor.
Lets try the following priors:
- \(\beta_0\): Normal prior centred at 0.61 with a variance of 0.48
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centered at 0 with a variance of 0.48
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 0.48
variance:
\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 (x_i - \bar{x})/\sigma_{x}\\ \end{align} \]
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalex" "sigma" "prior_Intercept"
[5] "prior_b" "prior_sigma" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalex" "sigma" "prior_Intercept"
[5] "prior_b" "prior_sigma" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
Lets try the following priors:
- \(\beta_0\): Normal prior centred at 19.68 with a variance of 1.87
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centered at 0 with a variance of 9.35
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 4.7
variance:
\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \sum{\beta_j x_{ij}}\\ \end{align} \]
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_xmedium" "b_xhigh" "sigma"
[5] "prior_Intercept" "prior_b" "prior_sigma" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_xmedium" "b_xhigh" "sigma"
[5] "prior_Intercept" "prior_b" "prior_sigma" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
Lets try the following priors:
- \(\beta\): t distribution (3 degrees of freedom) prior centered at 18.14 with a variance of 6.26
mean: since each groups mean is being estimated separately, they could either all have different priors, or more commonly, the same priors.
variance:
- \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 4.7
variance:
\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \sum{\beta_j x_{ij}}\\ \end{align} \]
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_xcontrol" "b_xmedium" "b_xhigh" "sigma"
[5] "prior_b" "prior_sigma" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_xcontrol" "b_xmedium" "b_xhigh" "sigma"
[5] "prior_b" "prior_sigma" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Pois(\lambda_i)\\ log(\lambda_i) =& \beta_0 + \beta_1 x_i\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 1.99 with a variance of 1.33
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 2.05
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_x" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_x" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Pois(\mu_i)\\ log(\mu_i) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 1.99 with a variance of 1.33
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 2.05
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "prior_Intercept"
[4] "prior_b" "lprior" "lp__"
[7] "accept_stat__" "stepsize__" "treedepth__"
[10] "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "prior_Intercept"
[4] "prior_b" "lprior" "lp__"
[7] "accept_stat__" "stepsize__" "treedepth__"
[10] "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Pois(\lambda_i)\\ log(\lambda_i) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 1.99 with a variance of 1.33
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1.33
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalex" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalex" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& NB(\lambda_i, \phi)\\ log(\lambda_i) =& \beta_0 + \beta_1 x_i\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Negative Binomial models, the link scale is log. So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 2.07 with a variance of 0.93
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1.43
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
dat4a.form <- bf(y ~ x, family = negbinomial(link = "log"))
dat4a.brm <- brm(dat4a.form,
data=dat4,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)Compiling Stan program...
Start sampling
Warning: There were 1614 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess
- explore the range of posterior predictions resulting from the priors alone
Warning in scale_y_log10(): log-10 transformation introduced infinite values.
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_x" "shape" "prior_Intercept"
[5] "prior_b" "prior_shape" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_x" "shape" "prior_Intercept"
[5] "prior_b" "prior_shape" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& NB(\mu_i, \phi)\\ log(\mu_i) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 2.07 with a variance of 0.93
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1.43
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
dat4b.form <- bf(y ~ scale(x, scale = FALSE), family = negbinomial(link = "log"))
dat4b.brm <- brm(dat4b.form,
data=dat4,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)Compiling Stan program...
Start sampling
Warning: There were 1695 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
- explore the range of posterior predictions resulting from the priors alone
Warning in scale_y_log10(): log-10 transformation introduced infinite values.
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "shape"
[4] "prior_Intercept" "prior_b" "prior_shape"
[7] "lprior" "lp__" "accept_stat__"
[10] "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "shape"
[4] "prior_Intercept" "prior_b" "prior_shape"
[7] "lprior" "lp__" "accept_stat__"
[10] "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& NB(\lambda_i, \phi)\\ log(\lambda_i) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 2.07 with a variance of 0.93
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.93
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
dat4c.form <- bf(y ~ scale(x), family = negbinomial(link = "log"))
dat4c.brm <- brm(dat4c.form,
data=dat4,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)Compiling Stan program...
Start sampling
Warning: There were 1170 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
- explore the range of posterior predictions resulting from the priors alone
Warning in scale_y_log10(): log-10 transformation introduced infinite values.
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalex" "shape" "prior_Intercept"
[5] "prior_b" "prior_shape" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalex" "shape" "prior_Intercept"
[5] "prior_b" "prior_shape" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + \beta_1 x_i\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Binary models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:
- the observed response values are only ever either 0 or 1
- a linear model is exploring whether the probability of a 1 changes from high to low or low to high according to the linear predictor
- the switch in probability is likely to be somewhere near the middle of the \(x\) range
- with a centered predictor, the mean response is expected to be approximately 0.5
- on a logit (log odds) scale, this corresponds to a value of 0.
- on a logit (log odds) scale, values of -3 and 3 are considered very wide
- on a logit scale, values between -1 and 1 are reasonable.
So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 0 with a variance of 1
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1
- start by fitting the model and sampling from the priors only
dat5a.form <- bf(y | trials(1) ~ x, family = binomial(link = "logit"))
dat5a.brm <- brm(dat5a.form,
data=dat5,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
- explore the range of posterior predictions resulting from the priors alone
For Binary data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_x" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_x" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1 -\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]
- \(\beta_0\): Normal prior centred at 0 with a variance of 1
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1
- start by fitting the model and sampling from the priors only
dat5b.form <- bf(y | trials(1) ~ scale(x, scale = FALSE), family = binomial(link = "logit"))
dat5b.brm <- brm(dat5b.form,
data=dat5,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "prior_Intercept"
[4] "prior_b" "lprior" "lp__"
[7] "accept_stat__" "stepsize__" "treedepth__"
[10] "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "prior_Intercept"
[4] "prior_b" "lprior" "lp__"
[7] "accept_stat__" "stepsize__" "treedepth__"
[10] "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1 -\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} \]
- \(\beta_0\): Normal prior centred at 0 with a variance of 1
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1
- start by fitting the model and sampling from the priors only
dat5c.form <- bf(y | trials(1) ~ scale(x), family = binomial(link = "logit"))
dat5c.brm <- brm(dat5c.form,
data=dat5,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalex" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalex" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + \beta_1 x_i\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:
- the expected \(\pi\) values are only ever between 0 or 1
- on a logit (log odds) scale, values of -3 and 3 are considered very wide
- on a logit scale, values between -1 and 1 are reasonable.
So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 0 with a variance of 0
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.51
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
For Binary data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.
Setting all 'trials' variables to 1 by default if not specified otherwise.
dat6a.brm |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
dat6a.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_x" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_x" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:
- the expected \(\pi\) values are only ever between 0 or 1
- on a logit (log odds) scale, values of -3 and 3 are considered very wide
- on a logit scale, values between -1 and 1 are reasonable.
So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 0 with a variance of 0
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.51
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
dat6b.form <- bf(count | trials(total) ~ scale(x, scale = FALSE),
family = binomial(link = "logit"))
dat6b.brm <- brm(dat6b.form,
data=dat6,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)Compiling Stan program...
Start sampling
- explore the range of posterior predictions resulting from the priors alone
For Binomial data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.
Setting all 'trials' variables to 1 by default if not specified otherwise.
dat6b.brm |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
dat6b.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "prior_Intercept"
[4] "prior_b" "lprior" "lp__"
[7] "accept_stat__" "stepsize__" "treedepth__"
[10] "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "prior_Intercept"
[4] "prior_b" "lprior" "lp__"
[7] "accept_stat__" "stepsize__" "treedepth__"
[10] "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\sigma\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:
- the expected \(\pi\) values are only ever between 0 or 1
- on a logit (log odds) scale, values of -3 and 3 are considered very wide
- on a logit scale, values between -1 and 1 are reasonable.
So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 0 with a variance of 0
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.33
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
For Binomial data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.
Setting all 'trials' variables to 1 by default if not specified otherwise.
dat6c.brm |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
dat6c.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalex" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalex" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
8 MCMC sampling diagnostics
MCMC sampling behaviour
Since the purpose of the MCMC sampling is to estimate the posterior of an unknown joint likelihood, it is important that we explore a range of diagnostics designed to help identify when the resulting likelihood might not be accurate.
- traceplots - plots of the individual draws in sequence. Traces that resemble noise suggest that all likelihood features are likely to have be traversed. Obvious steps or blocks of noise are likely to represent distinct features and could imply that there are yet other features that have not yet been traversed - necessitating additional iterations. Furthermore, each chain should be indistinguishable from the others
- autocorrelation function - plots of the degree of correlation between pairs of draws for a range of lags (distance along the chains). High levels of correlation (after a lag of 0, which is correlating each draw with itself) suggests a lack of independence between the draws and that therefore, summaries such as mean and median will be biased estimates. Ideally, all non-zero lag correlations should be less than 0.2. The left hand figure below demonstrates a clear pattern of autocorrelation, whereas the right hand figure shows no autocorrelation.
- convergence diagnostics - there are a range of diagnostics aimed at exploring whether the multiple chains are likely to have converged upon similar posteriors
- R hat - this metric compares between and within chain model parameter estimates, with the expectation that if the chains have converged, the between and within rank normalised estimates should be very similar (and Rhat should be close to 1). The more one chains deviates from the others, the higher the Rhat value. Values less than 1.05 are considered evidence of convergence.
- Bulk ESS - this is a measure of the effective sample size from the whole (bulk) of the posterior and is a good measure of the sampling efficiency of draws across the entire posterior
- Tail ESS - this is a measure of the effective sample size from the 5% and 95% quantiles (tails) of the posterior and is a good measure of the sampling efficiency of draws from the tail (areas of the posterior with least support and where samplers can get stuck).
There are numerous packages in R that support MCMC diagnostics. Popular packages include:
bayesplotrstanggmcmcm
Some of the most useful diagnostics are presented in the following table.
| Package | Description | function | rstanarm | brms |
|---|---|---|---|---|
| bayesplot | Traceplot | mcmc_trace |
plot(mod, plotfun='trace') |
mcmc_plot(mod, type='trace') |
| Density plot | mcmc_dens |
plot(mod, plotfun='dens') |
mcmc_plot(mod, type='dens') |
|
| Density & Trace | mcmc_combo |
plot(mod, plotfun='combo') |
mcmc_plot(mod, type='combo') |
|
| ACF | mcmc_acf_bar |
plot(mod, plotfun='acf_bar') |
mcmc_plot(mod, type='acf_bar') |
|
| Rhat hist | mcmc_rhat_hist |
plot(mod, plotfun='rhat_hist') |
mcmc_plot(mod, type='rhat_hist') |
|
| No. Effective | mcmc_neff_hist |
plot(mod, plotfun='neff_hist') |
mcmc_plot(mod, type='neff_hist') |
|
| rstan | Traceplot | stan_trace |
stan_trace(mod) |
stan_trace(mod) |
| ACF | stan_ac |
stan_ac(mod) |
stan_ac(mod) |
|
| Rhat | stan_rhat |
stan_rhat(mod) |
stan_rhat(mod) |
|
| No. Effective | stan_ess |
stan_ess(mod) |
stan_ess(mod) |
|
| Density plot | stan_dens |
stan_dens(mod) |
stan_dens(mod) |
|
| ggmcmc | Traceplot | ggs_traceplot |
ggs_traceplot(ggs(mod)) |
ggs_traceplot(ggs(mod)) |
| ACF | ggs_autocorrelation |
ggs_autocorrelation(ggs(mod)) |
ggs_autocorrelation(ggs(mod)) |
|
| Rhat | ggs_Rhat |
ggs_Rhat(ggs(mod)) |
ggs_Rhat(ggs(mod)) |
|
| No. Effective | ggs_effective |
ggs_effective(ggs(mod)) |
ggs_effective(ggs(mod)) |
|
| Cross correlation | ggs_crosscorrelation |
ggs_crosscorrelation(ggs(mod)) |
ggs_crosscorrelation(ggs(mod)) |
|
| Scale reduction | ggs_grb |
ggs_grb(ggs(mod)) |
ggs_grb(ggs(mod)) |
|
I personally prefer the rstan version of plots and thus these are the ones I will showcase.
Bayesian samplers involve many calls to randomisation functions. As a result, the estimates will vary slightly each time the routines are run. You should expect that the outputs that you obtain will differ slightly from those that I am displaying. Nevertheless, the main conclusions should remain robust across subsequent runs.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
9 Model validation
Model validation involves exploring the model diagnostics and fit to ensure that the model is broadly appropriate for the data. As such, exploration of the residuals should be routine.
For more complex models (those that contain multiple effects, it is also advisable to plot the residuals against each of the individual predictors. For sampling designs that involve sample collection over space or time, it is also a good idea to explore whether there are any temporal or spatial patterns in the residuals.
There are numerous situations (e.g. when applying specific variance-covariance structures to a model) where raw residuals do not reflect the interior workings of the model. Typically, this is because they do not take into account the variance-covariance matrix or assume a very simple variance-covariance matrix. Since the purpose of exploring residuals is to evaluate the model, for these cases, it is arguably better to draw conclusions based on standardized (or studentized) residuals.
Unfortunately the definitions of standardised and studentised residuals appears to vary and the two terms get used interchangeably. I will adopt the following definitions:
- Standardized residuals
- the raw residuals divided by the true standard deviation of the residuals (which of course is rarely known).
- Studentized residuals
- the raw residuals divided by the standard deviation of the residuals. Note that externally studentised residuals are calculated by dividing the raw residuals by a unique standard deviation for each observation that is calculated from regressions having left each successive observation out.
- Pearson residuals
- the raw residuals divided by the standard deviation of the response variable.
The mark of a good model is being able to predict well. In an ideal world, we would have sufficiently large sample size as to permit us to hold a fraction (such as 25%) back thereby allowing us to train the model on 75% of the data and then see how well the model can predict the withheld 25%. Unfortunately, such a luxury is still rare in ecology.
The next best option is to see how well the model can predict the observed data. Models tend to struggle most with the extremes of trends and have particular issues when the extremes approach logical boundaries (such as zero for count data and standard deviations). We can use the fitted model to generate random predicted observations and then explore some properties of these compared to the actual observed data.
| Package | Description | function | rstanarm | brms |
|---|---|---|---|---|
| bayesplot | Density overlay | ppc_dens_overlay |
pp_check(mod, plotfun='dens_overlay') |
pp_check(mod, type='dens_overlay') |
| Obs vs Pred error | ppc_error_scatter_avg |
pp_check(mod, plotfun='error_scatter_avg') |
pp_check(mod, type='error_scatter_avg') |
|
| Pred error vs x | ppc_error_scatter_avg_vs_x |
pp_check(mod, x=, plotfun='error_scatter_avg_vs_x') |
pp_check(mod, x=, type='error_scatter_avg_vs_x') |
|
| Preds vs x | ppc_intervals |
pp_check(mod, x=, plotfun='intervals') |
pp_check(mod, x=, type='intervals') |
|
| Partial plot | ppc_ribbon |
pp_check(mod, x=, plotfun='ribbon') |
pp_check(mod, x=, type='ribbon') |
|
bayesplot PPC module:
ppc_bars
ppc_bars_grouped
ppc_boxplot
ppc_dens
ppc_dens_overlay
ppc_dens_overlay_grouped
ppc_ecdf_overlay
ppc_ecdf_overlay_grouped
ppc_error_binned
ppc_error_hist
ppc_error_hist_grouped
ppc_error_scatter
ppc_error_scatter_avg
ppc_error_scatter_avg_grouped
ppc_error_scatter_avg_vs_x
ppc_freqpoly
ppc_freqpoly_grouped
ppc_hist
ppc_intervals
ppc_intervals_grouped
ppc_km_overlay
ppc_km_overlay_grouped
ppc_loo_intervals
ppc_loo_pit
ppc_loo_pit_overlay
ppc_loo_pit_qq
ppc_loo_ribbon
ppc_pit_ecdf
ppc_pit_ecdf_grouped
ppc_ribbon
ppc_ribbon_grouped
ppc_rootogram
ppc_scatter
ppc_scatter_avg
ppc_scatter_avg_grouped
ppc_stat
ppc_stat_2d
ppc_stat_freqpoly
ppc_stat_freqpoly_grouped
ppc_stat_grouped
ppc_violin_grouped
Bayesian samplers involve many calls to randomisation functions. As a result, the estimates will vary slightly each time the routines are run. You should expect that the outputs that you obtain will differ slightly from those that I am displaying. Nevertheless, the main conclusions should remain robust across subsequent runs.
resid <- resid(dat1a.brm2)[, "Estimate"]
fit <- fitted(dat1a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat1a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat$x))Conclusions:
- there does not appear to be any pattern in the residuals
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
Density overlay
These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat1b.brm2)[, "Estimate"]
fit <- fitted(dat1b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat1b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat$x))Conclusions:
- there does not appear to be any pattern in the residuals
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
Density overlay
These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat1b.resids <- make_brms_dharma_res(dat1b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat1b.resids)) +
wrap_elements(~ plotResiduals(dat1b.resids)) +
wrap_elements(~ testDispersion(dat1b.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat1c.brm2)[, "Estimate"]
fit <- fitted(dat1c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat1c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat$x))Conclusions:
- there does not appear to be any pattern in the residuals
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
Density overlay
These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat1c.resids <- make_brms_dharma_res(dat1c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat1c.resids)) +
wrap_elements(~ plotResiduals(dat1c.resids)) +
wrap_elements(~ testDispersion(dat1c.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat2a.brm2)[, "Estimate"]
fit <- fitted(dat2a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat2a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat2$x))Conclusions:
- there does not appear to be any pattern in the residuals
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
Density overlay
These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat2a.resids <- make_brms_dharma_res(dat2a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat2a.resids)) +
wrap_elements(~ plotResiduals(dat2a.resids)) +
wrap_elements(~ testDispersion(dat2a.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat2b.brm2)[, "Estimate"]
fit <- fitted(dat2b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat2b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat2$x))Conclusions:
- there does not appear to be any pattern in the residuals
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
Density overlay
These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat2b.resids <- make_brms_dharma_res(dat2b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat2b.resids)) +
wrap_elements(~ plotResiduals(dat2b.resids)) +
wrap_elements(~ testDispersion(dat2b.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat3a.brm2)[, "Estimate"]
fit <- fitted(dat3a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat3a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat3$x))conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat3a.resids <- make_brms_dharma_res(dat3a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat3a.resids)) +
wrap_elements(~ plotResiduals(dat3a.resids)) +
wrap_elements(~ testDispersion(dat3a.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat3b.brm2)[, "Estimate"]
fit <- fitted(dat3b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat3b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat3$x))conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat3b.resids <- make_brms_dharma_res(dat3b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat3b.resids)) +
wrap_elements(~ plotResiduals(dat3b.resids)) +
wrap_elements(~ testDispersion(dat3b.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat3c.brm2)[, "Estimate"]
fit <- fitted(dat3c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat3c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat3$x))conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat3c.resids <- make_brms_dharma_res(dat3c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat3c.resids)) +
wrap_elements(~ plotResiduals(dat3c.resids)) +
wrap_elements(~ testDispersion(dat3c.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat4a.brm2)[, "Estimate"]
fit <- fitted(dat4a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat4a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat4$x))conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat4a.resids <- make_brms_dharma_res(dat4a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat4a.resids)) +
wrap_elements(~ plotResiduals(dat4a.resids)) +
wrap_elements(~ testDispersion(dat4a.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat4b.brm2)[, "Estimate"]
fit <- fitted(dat4b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat4b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat4$x))conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat4b.resids <- make_brms_dharma_res(dat4b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat4b.resids)) +
wrap_elements(~ plotResiduals(dat4b.resids)) +
wrap_elements(~ testDispersion(dat4b.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat4c.brm2)[, "Estimate"]
fit <- fitted(dat4c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat4c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat4$x))conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat4c.resids <- make_brms_dharma_res(dat4c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat4c.resids)) +
wrap_elements(~ plotResiduals(dat4c.resids)) +
wrap_elements(~ testDispersion(dat4c.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat5a.brm2)[, "Estimate"]
fit <- fitted(dat5a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat5a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat3$x))conclusions:
- the above plots are almost impossible to interpret for binary data.
- they will always feature two curved lines (one for the zeros, the other for the ones)
- it is virtually impossible to diagnose any issues from such plots.
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
- note that these density plots are going to be too crude to be completely useful
- all the mass should be at either 0 or 1
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
- this sort of plot is of very little value for binary data
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
- this sort of plot is of very little value for binary data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
- this sort of plot is of very little value for binary data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
wrap_elements(~ testUniformity(dat5a.resids)) +
wrap_elements(~ plotResiduals(dat5a.resids, quantreg = FALSE)) +
wrap_elements(~ testDispersion(dat5a.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat5b.brm2)[, "Estimate"]
fit <- fitted(dat5b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat5b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat3$x))conclusions:
- the above plots are almost impossible to interpret for binary data.
- they will always feature two curved lines (one for the zeros, the other for the ones)
- it is virtually impossible to diagnose any issues from such plots.
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
- note that these density plots are going to be too crude to be completely useful
- all the mass should be at either 0 or 1
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
- this sort of plot is of very little value for binary data
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
- this sort of plot is of very little value for binary data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
- this sort of plot is of very little value for binary data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
wrap_elements(~ testUniformity(dat5b.resids)) +
wrap_elements(~ plotResiduals(dat5b.resids, quantreg = FALSE)) +
wrap_elements(~ testDispersion(dat5b.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat5c.brm2)[, "Estimate"]
fit <- fitted(dat5c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat5c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat3$x))conclusions:
- the above plots are almost impossible to interpret for binary data.
- they will always feature two curved lines (one for the zeros, the other for the ones)
- it is virtually impossible to diagnose any issues from such plots.
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
- note that these density plots are going to be too crude to be completely useful
- all the mass should be at either 0 or 1
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
- this sort of plot is of very little value for binary data
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
- this sort of plot is of very little value for binary data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
- this sort of plot is of very little value for binary data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
wrap_elements(~ testUniformity(dat5c.resids)) +
wrap_elements(~ plotResiduals(dat5c.resids, quantreg = FALSE)) +
wrap_elements(~ testDispersion(dat5c.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat6a.brm2)[, "Estimate"]
fit <- fitted(dat6a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat6a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat6$x))conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat6a.resids <- make_brms_dharma_res(dat6a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat6a.resids)) +
wrap_elements(~ plotResiduals(dat6a.resids)) +
wrap_elements(~ testDispersion(dat6a.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat6b.brm2)[, "Estimate"]
fit <- fitted(dat6b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat6b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat6$x))conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat6b.resids <- make_brms_dharma_res(dat6b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat6b.resids)) +
wrap_elements(~ plotResiduals(dat6b.resids)) +
wrap_elements(~ testDispersion(dat6b.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat6c.brm2)[, "Estimate"]
fit <- fitted(dat6c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat6c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat6$x))conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat6c.resids <- make_brms_dharma_res(dat6c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat6c.resids)) +
wrap_elements(~ plotResiduals(dat6c.resids)) +
wrap_elements(~ testDispersion(dat6c.resids)) +
plot_layout(nrow = 1)If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()andplot_layout()functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
10 Partial effects plots
Prior to exploring the modelled numerical estimates, it is worth reviewing simple plots of the predicted trends associated with each predictor. Importantly, they typically express the trends on the scale of the response, although for some, it is possible to force the trends to be expressed on the link scale. Such plots provides a final visual check of whether the model has yielded sensible outcomes. Furthermore, they usually assist in the interpretation of the major estimated parameters.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.
dat6b.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)#OR
dat6b.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total),
spaghetti = TRUE, ndraws = 200) |>
plot(points = TRUE)Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.
dat6c.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)#OR
dat6c.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total),
spaghetti = TRUE, ndraws = 200) |>
plot(points = TRUE)Notice that although we had centered and scaled the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.
11 Model investigation
Rather than simply return point estimates of each of the model parameters, Bayesian analyses capture the full posterior of each parameter. These are typically stored within the list structure of the output object.
As with most statistical routines, the overloaded summary() function provides an overall summary of the model parameters. Typically, the summaries will include the means / medians along with credibility intervals and perhaps convergence diagnostics (such as R hat). However, more thorough investigation and analysis of the parameter posteriors requires access to the full posteriors.
There is currently a plethora of functions for extracting the full posteriors from models. In part, this is a reflection of a rapidly evolving space with numerous packages providing near equivalent functionality (it should also be noted, that over time, many of the functions have been deprecated due to inconsistencies in their names). Broadly speaking, the functions focus on draws from the posterior of either the parameters (intercept, slope, standard deviation etc), linear predictor, expected values or predicted values. The distinction between the latter three are highlighted in the following table.
| Property | Description |
|---|---|
| linear predictors | values predicted on the link scale |
| expected values | predictions (on response scale) without residual error (predicting expected mean outcome(s)) |
| predicted values | predictions (on response scale) that incorporate residual error |
| fitted values | predictions on the response scale |
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ x
Data: dat (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.28 0.38 -0.46 1.05 1.00 2452 2248
x -0.07 0.47 -1.03 0.83 1.00 2438 2225
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.12 0.36 0.65 1.97 1.00 2184 2093
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\), the expected value of \(y\) is 0.276 and we are 95% confident that the true value is between -0.455 and 1.051. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.074 units and we are 95% confident that this change is between -1.027 and 0.834- sigma is estimated to be 1.12
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat1a.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 8 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.278 -0.453 1.05 1.00 2400 2452. 2248.
2 b_x -0.0611 -1.02 0.836 1.00 2400 2438. 2225.
3 sigma 1.05 0.584 1.81 1.00 2400 2184. 2093.
4 prior_Intercept 31.7 2.16 62.6 1.00 2400 2384. 2272.
5 prior_b -0.237 -14.8 11.9 1.00 2400 2400. 2185.
6 prior_sigma 11.1 0.00367 49.0 1.00 2400 2236. 2084.
7 lprior -11.2 -11.3 -11.1 1.00 2400 2467. 2264.
8 lp__ -25.1 -28.5 -23.7 1.00 2400 2160. 2393.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 0.278 and we are 95% confident that the true value is between -0.453 and 1.052. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.061 units and we are 95% confident that this change is between -1.024 and 0.836- sigma is estimated to be 1.05
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
dat1a.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 3 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.278 -0.453 1.05 1.00 2400 2448. 2229.
2 b_x -0.0611 -1.02 0.836 1.00 2400 2435. 2128.
3 sigma 1.05 0.584 1.81 1.00 2400 2138. 2082.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 0.278 and we are 95% confident that the true value is between -0.453 and 1.052. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.061 units and we are 95% confident that this change is between -1.024 and 0.836- sigma is estimated to be 1.05
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.
dat1a.brm2 |>
gather_draws(b_Intercept, b_x, sigma) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.06432815 8.807059e-08 0.3185489 0.95 median hdci
Conclusions:
- 6.433% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 0% and 31.855%
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ scale(x, scale = FALSE)
Data: dat (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.28 0.36 -0.41 0.99 1.00 2160 2036
scalexscaleEQFALSE -0.09 0.45 -0.97 0.81 1.00 2368 2213
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.12 0.36 0.66 1.98 1.00 2457 2298
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\) (its average since it is centered), the expected value of \(y\) is 0.283 and we are 95% confident that the true value is between -0.408 and 0.99. So \(y\) is expected to be 0.283 at the average \(x\).x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.085 units and we are 95% confident that this change is between -0.972 and 0.81- sigma is estimated to be 1.12
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat1b.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 8 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.279 -0.416 0.970 1.00 2400 2160. 2036.
2 b_scalexscaleEQFALSE -0.0808 -0.977 0.810 1.00 2400 2368. 2213.
3 sigma 1.05 0.591 1.83 1.00 2400 2457. 2298.
4 prior_Intercept 31.0 3.06 61.4 1.00 2400 2443. 2334.
5 prior_b -0.138 -13.4 13.1 1.00 2400 2042. 2308.
6 prior_sigma 11.1 0.00987 44.7 1.00 2400 2223. 2453.
7 lprior -11.2 -11.3 -11.1 1.00 2400 2202. 2375.
8 lp__ -25.0 -28.3 -23.7 1.00 2400 2133. 2191.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\) (its average since it is centered), the expected value of \(y\) is 0.279 and we are 95% confident that the true value is between -0.416 and 0.97. So \(y\) is expected to be 0.283 at the average \(x\).b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.081 units and we are 95% confident that this change is between -0.977 and 0.81- sigma is estimated to be 1.05
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
dat1b.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 3 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.279 -0.416 0.970 1.00 2400 2134. 2021.
2 b_scalexscaleEQFALSE -0.0808 -0.977 0.810 1.00 2400 2344. 2196.
3 sigma 1.05 0.591 1.83 1.00 2400 2380. 2250.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\) (its average since it is centered), the expected value of \(y\) is 0.279 and we are 95% confident that the true value is between -0.416 and 0.97. So \(y\) is expected to be 0.283 at the average \(x\).b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.081 units and we are 95% confident that this change is between -0.977 and 0.81- sigma is estimated to be 1.05
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "sigma"
[4] "prior_Intercept" "prior_b" "prior_sigma"
[7] "lprior" "lp__" "accept_stat__"
[10] "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
dat1b.brm2 |>
gather_draws(`b_Intercept`,`b_.*x.*`, `sigma`, regex = TRUE) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.06278833 8.380068e-08 0.3077008 0.95 median hdci
Conclusions:
- 6.279% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 0% and 30.77%
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ scale(x)
Data: dat (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.29 0.38 -0.46 1.06 1.00 2275 2166
scalex -0.08 0.39 -0.87 0.71 1.00 2304 2150
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.12 0.37 0.67 2.00 1.00 2296 2251
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.289 and we are 95% confident that the true value is between -0.455 and 1.058. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) -0.079 units and we are 95% confident that this change is between -0.868 and 0.711- sigma is estimated to be 1.12
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat1c.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 8 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.289 -0.478 1.02 1.00 2400 2275. 2166.
2 b_scalex -0.0765 -0.938 0.634 1.00 2400 2304. 2150.
3 sigma 1.04 0.601 1.82 1.00 2400 2296. 2251.
4 prior_Intercept 30.8 0.610 59.6 1.00 2400 2307. 2394.
5 prior_b 0.378 -48.4 43.5 1.00 2400 2354. 2016.
6 prior_sigma 11.4 0.00248 49.6 1.00 2400 2506. 2552.
7 lprior -12.5 -12.6 -12.4 1.00 2400 2300. 2273.
8 lp__ -26.4 -29.7 -25.0 1.00 2400 2383. 2215.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.289 and we are 95% confident that the true value is between -0.478 and 1.023. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)-0.076 units and we are 95% confident that this change is between -0.938 and 0.634- sigma is estimated to be 1.04
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
dat1c.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 3 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.289 -0.478 1.02 1.00 2400 2215. 2145.
2 b_scalex -0.0765 -0.938 0.634 1.00 2400 2294. 2062.
3 sigma 1.04 0.601 1.82 1.00 2400 2289. 2246.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.289 and we are 95% confident that the true value is between -0.478 and 1.023. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) -0.076 units and we are 95% confident that this change is between -0.938 and 0.634- sigma is estimated to be 1.04
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.
[1] "b_Intercept" "b_scalex" "sigma" "prior_Intercept"
[5] "prior_b" "prior_sigma" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
dat1c.brm2 |>
gather_draws(`b_Intercept`,`b_.*x`, `sigma`, regex = TRUE) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.06634722 6.783788e-08 0.3242866 0.95 median hdci
Conclusions:
- 6.635% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 0% and 32.429%
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ x
Data: dat2 (Number of observations: 12)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 20.80 1.92 16.99 24.61 1.00 2468 2449
xmedium -0.77 2.77 -5.97 4.95 1.00 2444 2177
xhigh -8.65 3.09 -14.48 -1.95 1.00 2525 2361
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 4.46 1.11 2.88 7.04 1.00 2331 2284
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x\) is “control”, the expected value of \(y\) is 20.799 and we are 95% confident that the true value is between 16.993 and 24.612.x*: (the slopes) - the change (effect) in \(y\) between the first (control) group unit (=1) and each other \(x\) level.xmedium: \(y\) is (on average) 0.767 units (95% confident that this change is between -5.968 and 4.949 less in the “medium” group compared to the “control” group.xhigh: \(y\) is (on average) 8.649 units and we are 95% confident that this change is between -14.478 and -1.952 less in the “high” group compared to the “control” group.
- sigma is estimated to be 4.46
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat2a.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 9 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 20.8 16.7 24.3 1.00 2400 2468. 2449.
2 b_xmedium -0.795 -6.03 4.69 1.00 2400 2444. 2177.
3 b_xhigh -8.73 -14.8 -2.56 1.00 2400 2525. 2361.
4 sigma 4.29 2.66 6.70 1.00 2400 2332. 2284.
5 prior_Intercept 19.8 16.3 23.5 1.00 2400 2340. 2253.
6 prior_b -0.0285 -19.4 19.8 1.00 2400 2268. 2164.
7 prior_sigma 3.54 0.00326 14.9 1.00 2400 2505. 2216.
8 lprior -11.4 -13.1 -10.1 0.999 2400 2276. 1875.
9 lp__ -44.3 -47.7 -42.5 1.00 2400 2269. 2188.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x\) is “control”, the expected value of \(y\) is 20.787 and we are 95% confident that the true value is between 16.71 and 24.281.x*: (the slopes) - the change (effect) in \(y\) between the first (control) group unit (=1) and each other \(x\) level.xmedium: \(y\) is (on average) 0.795 units (95% confident that this change is between -6.035 and 4.686 less in the “medium” group compared to the “control” group.xhigh: \(y\) is (on average) 8.733 units and we are 95% confident that this change is between -14.751 and -2.559 less in the “high” group compared to the “control” group.
- sigma is estimated to be 4.46
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
dat2a.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 4 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 20.8 16.7 24.3 1.00 2400 2454. 2438.
2 b_xmedium -0.795 -6.03 4.69 1.00 2400 2422. 2169.
3 b_xhigh -8.73 -14.8 -2.56 1.00 2400 2491. 2354.
4 sigma 4.29 2.66 6.70 1.00 2400 2321. 2252.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x\) is “control”, the expected value of \(y\) is 20.787 and we are 95% confident that the true value is between 16.71 and 24.281.x*: (the slopes) - the change (effect) in \(y\) between the first (control) group unit (=1) and each other \(x\) level.xmedium: \(y\) is (on average) 0.795 units (95% confident that this change is between -6.035 and 4.686 less in the “medium” group compared to the “control” group.xhigh: \(y\) is (on average) 8.733 units and we are 95% confident that this change is between -14.751 and -2.559 less in the “high” group compared to the “control” group.
- sigma is estimated to be 4.46
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.
[1] "b_Intercept" "b_xmedium" "b_xhigh" "sigma"
[5] "prior_Intercept" "prior_b" "prior_sigma" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
dat2a.brm2 |>
gather_draws(`b_Intercept`, `b_x.*`, `sigma`, regex = TRUE) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.5448169 0.1558967 0.7241144 0.95 median hdci
Conclusions:
- 54.482% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 15.59% and 72.411%
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ -1 + x
Data: dat2 (Number of observations: 12)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
xcontrol 20.29 2.19 15.73 24.52 1.00 2068 2191
xmedium 18.67 2.12 14.28 22.75 1.00 2330 1996
xhigh 10.96 2.17 6.82 15.54 1.00 2566 2368
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 4.42 1.09 2.83 7.00 1.00 2100 2328
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.x*: (the group means).xcontrol: the expected value of \(y\) in the “control” group is 2.186 (95% credibility interval is between 15.726 and 24.516)xmedium: the expected value of \(y\) in the “control” group is 2.117 (95% credibility interval is between 14.285 and 22.753)xhigh: the expected value of \(y\) in the “control” group is 2.172 (95% credibility interval is between 6.817 and 15.544)
- sigma is estimated to be 4.42
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat2b.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 8 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_xcontrol 20.3 15.5 24.3 1.00 2400 2069. 2191.
2 b_xmedium 18.7 14.4 22.8 1.00 2400 2330. 1996.
3 b_xhigh 10.9 6.55 15.2 1.00 2400 2566. 2368.
4 sigma 4.26 2.58 6.47 1.00 2400 2100. 2328.
5 prior_b 16.5 -2.63 38.8 1.00 2400 2264. 2112.
6 prior_sigma 3.55 0.00155 14.8 1.00 2400 2128. 2386.
7 lprior -11.8 -12.8 -11.2 1.00 2400 2419. 2278.
8 lp__ -44.6 -48.2 -42.7 1.00 2400 2245. 2352.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.x*: (the means of each group)xcontrol: the expected value of \(y\) in the “control” group is 20.335 (95% credibility interval is between 15.507 and 24.295)xmedium: the expected value of \(y\) in the “control” group is 18.734 (95% credibility interval is between 14.398 and 22.802)xhigh: the expected value of \(y\) in the “control” group is 10.914 (95% credibility interval is between 6.552 and 15.2)
- sigma is estimated to be 4.42
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
dat2b.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 4 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_xcontrol 20.3 15.5 24.3 1.00 2400 2064. 2154.
2 b_xmedium 18.7 14.4 22.8 1.00 2400 2323. 1943.
3 b_xhigh 10.9 6.55 15.2 1.00 2400 2553. 2361.
4 sigma 4.26 2.58 6.47 1.00 2400 2026. 2320.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.x*: (the means of each group)xcontrol: the expected value of \(y\) in the “control” group is 20.335 (95% credibility interval is between 15.507 and 24.295)xmedium: the expected value of \(y\) in the “control” group is 18.734 (95% credibility interval is between 14.398 and 22.802)xhigh: the expected value of \(y\) in the “control” group is 10.914 (95% credibility interval is between 6.552 and 15.2)
- sigma is estimated to be 4.42
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.
[1] "b_xcontrol" "b_xmedium" "b_xhigh" "sigma"
[5] "prior_b" "prior_sigma" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
dat2b.brm2 |>
gather_draws(`b_x.*`, `sigma`, regex = TRUE) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.5689382 0.2024998 0.7242002 0.95 median hdci
Conclusions:
- 56.894% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 20.25% and 72.42%
Family: poisson
Links: mu = log
Formula: y ~ x
Data: dat3 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.02 0.35 -0.70 0.68 1.00 2167 2049
x 0.34 0.04 0.26 0.43 1.00 2153 2156
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\), the expected value of \(y\) is 0.024 and we are 95% confident that the true value is between -0.699 and 0.677. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.345 units and we are 95% confident that this change is between 0.263 and 0.431
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat3a.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 6 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.0320 -0.705 0.657 1.00 2400 2166. 2049.
2 b_x 0.344 0.261 0.428 1.00 2400 2153. 2156.
3 prior_Intercept 2.03 -0.553 4.69 1.00 2400 2389. 2411.
4 prior_b -0.0116 -6.37 6.38 1.00 2400 2574. 2435.
5 lprior -2.92 -2.95 -2.91 1.00 2400 2184. 2213.
6 lp__ -26.8 -29.0 -26.1 1.00 2400 2197. 2411.
The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 2.034 and we are 95% confident that the true value is between -0.705 and 0.657. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.344 units and we are 95% confident that this change is between 0.261 and 0.428
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat3a.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.03 0.440 1.83 1.00 2400 2160. 2042.
2 b_x 1.41 1.29 1.53 1.00 2400 2147. 2143.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 1.033 and we are 95% confident that the true value is between 0.494 and 1.929. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.41 and we are 95% confident that this change is between 1.298 and 1.535. This represents a (value -1) * 100 41 % increase in \(y\) per unit increase in \(x\).
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.
dat3a.brm2 |>
gather_draws(b_Intercept, b_x) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.9218802 0.8143995 0.934374 0.95 median hdci
Conclusions:
- 92.188% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 81.44% and 93.437%
Family: poisson
Links: mu = log
Formula: y ~ scale(x, scale = FALSE)
Data: dat3 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.92 0.14 1.64 2.18 1.00 2184 2130
scalexscaleEQFALSE 0.34 0.04 0.27 0.43 1.00 2189 2313
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\), the expected value of \(y\) is 1.923 and we are 95% confident that the true value is between 1.643 and 2.183. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.345 units and we are 95% confident that this change is between 0.265 and 0.43
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat3b.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 6 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.93 1.65 2.19 1.00 2400 2183. 2130.
2 b_scalexscaleEQFALSE 0.343 0.261 0.424 1.00 2400 2188. 2313.
3 prior_Intercept 1.98 -0.669 4.42 1.00 2400 2401. 2301.
4 prior_b 0.0452 -5.55 6.33 0.999 2400 2288. 2311.
5 lprior -2.92 -2.95 -2.91 1.00 2400 2064. 2330.
6 lp__ -26.8 -29.0 -26.1 1.00 2400 2376. 2275.
The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 1.977 and we are 95% confident that the true value is between 1.65 and 2.188. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.343 units and we are 95% confident that this change is between 0.261 and 0.424
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat3b.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 6.86 5.15 8.84 1.00 2400 2176. 2093.
2 b_scalexscaleEQFALSE 1.41 1.30 1.53 1.00 2400 2165. 2303.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 6.864 and we are 95% confident that the true value is between 5.208 and 8.916. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.41 and we are 95% confident that this change is between 1.298 and 1.529. This represents a (value -1) * 100 41 % increase in \(y\) per unit increase in \(x\).
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.
dat3b.brm2 |>
gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.9223029 0.8208962 0.934376 0.95 median hdci
Conclusions:
- 92.23% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 82.09% and 93.438%
Family: poisson
Links: mu = log
Formula: y ~ scale(x)
Data: dat3 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.93 0.14 1.65 2.20 1.00 2271 1957
scalex 1.03 0.13 0.78 1.30 1.00 2258 2145
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.929 and we are 95% confident that the true value is between 1.646 and 2.203. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) 1.035 units and we are 95% confident that this change is between 0.781 and 1.295
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat3c.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 6 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.93 1.65 2.20 1.00 2400 2270. 1957.
2 b_scalex 1.03 0.776 1.29 1.00 2400 2258. 2145.
3 prior_Intercept 2.01 -0.473 4.66 1.00 2400 2407. 2499.
4 prior_b -0.0503 -4.16 4.29 1.00 2400 2458. 2369.
5 lprior -2.86 -3.06 -2.71 1.00 2400 2253. 2236.
6 lp__ -26.8 -29.2 -26.0 1.00 2400 1935. 1941.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.934 and we are 95% confident that the true value is between 1.647 and 2.204. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)1.031 units and we are 95% confident that this change is between 0.776 and 1.289
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat3c.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 6.92 5.04 8.86 1.00 2400 2265. 1919.
2 b_scalex 2.80 2.17 3.62 1.00 2400 2252. 2138.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 6.918 and we are 95% confident that the true value is between 5.194 and 9.06. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) a factor of 2.804 units and we are 95% confident that this change is between 2.174 and 3.628. This represents a ((value -1) * 100) 180.4% increase in \(y\) per unit increase in \(x\).
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.
[1] "b_Intercept" "b_scalex" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
dat3c.brm2 |>
gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.9204325 0.8031199 0.9343758 0.95 median hdci
Conclusions:
- 92.043% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 80.312% and 93.438%
Family: negbinomial
Links: mu = log; shape = identity
Formula: y ~ x
Data: dat4 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.37 0.39 -0.40 1.12 1.00 2493 2257
x 0.28 0.05 0.18 0.39 1.00 2568 2231
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 52.94 61.13 3.06 233.36 1.00 2500 2383
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\), the expected value of \(y\) is 0.372 and we are 95% confident that the true value is between -0.399 and 1.116. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.282 units and we are 95% confident that this change is between 0.181 and 0.387
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat4a.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 8 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 3.78e- 1 -0.399 1.12 1.00 2400 2493. 2257.
2 b_x 2.81e- 1 0.176 0.383 1.00 2400 2568. 2231.
3 shape 3.11e+ 1 0.501 177. 1.00 2400 2500. 2383.
4 prior_Intercept 1.99e+ 0 -0.0160 3.90 1.00 2400 2200. 2449.
5 prior_b -5.65e- 3 -4.98 4.29 1.00 2400 2353. 2300.
6 prior_shape 4.42e-29 0 0.325 1.00 2400 2221. 2451.
7 lprior -1.07e+ 1 -14.4 -8.05 1.00 2400 2501. 2383.
8 lp__ -3.18e+ 1 -34.8 -30.6 1.00 2400 2110. 2035.
The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 31.084 and we are 95% confident that the true value is between -0.399 and 1.117. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.281 units and we are 95% confident that this change is between 0.176 and 0.383
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat4a.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.46 0.584 2.83 1.00 2400 2480. 2242.
2 b_x 1.32 1.19 1.47 1.00 2400 2559. 2219.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 1.459 and we are 95% confident that the true value is between 0.671 and 3.056. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.324 and we are 95% confident that this change is between 1.193 and 1.467. This represents a (value -1) * 100 32.4 % increase in \(y\) per unit increase in \(x\).
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.
dat4a.brm2 |>
gather_draws(b_Intercept, b_x) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.8965354 0.6728177 0.9214414 0.95 median hdci
Conclusions:
- 89.654% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 67.282% and 92.144%
Family: negbinomial
Links: mu = log; shape = identity
Formula: y ~ scale(x, scale = FALSE)
Data: dat4 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.92 0.16 1.61 2.22 1.00 2278 2390
scalexscaleEQFALSE 0.28 0.05 0.18 0.39 1.00 2399 2127
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 50.51 58.60 3.15 213.22 1.00 2362 2252
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\), the expected value of \(y\) is 1.922 and we are 95% confident that the true value is between 1.612 and 2.217. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.281 units and we are 95% confident that this change is between 0.18 and 0.39
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat4b.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 8 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.92e+ 0 1.62 2.22 1.00 2400 2277. 2390.
2 b_scalexscaleEQFALSE 2.80e- 1 0.181 0.392 0.999 2400 2399. 2127.
3 shape 2.97e+ 1 0.733 160. 1.00 2400 2362. 2252.
4 prior_Intercept 2.09e+ 0 0.216 3.88 1.00 2400 2531. 2375.
5 prior_b 7.68e- 2 -4.30 4.48 1.00 2400 2245. 2147.
6 prior_shape 6.52e-28 0 0.372 0.999 2400 2300. 2366.
7 lprior -1.06e+ 1 -13.9 -7.80 1.00 2400 2365. 2368.
8 lp__ -3.18e+ 1 -34.7 -30.5 0.999 2400 2473. 2308.
The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 29.709 and we are 95% confident that the true value is between 1.616 and 2.22. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.28 units and we are 95% confident that this change is between 0.181 and 0.392
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat4b.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 6.85 4.90 9.04 1.00 2400 2199. 2364.
2 b_scalexscaleEQFALSE 1.32 1.19 1.46 1.00 2400 2389. 2108.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 6.85 and we are 95% confident that the true value is between 5.031 and 9.207. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.323 and we are 95% confident that this change is between 1.199 and 1.479. This represents a (value -1) * 100 32.3 % increase in \(y\) per unit increase in \(x\).
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.
dat4b.brm2 |>
gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.8960203 0.6336949 0.9214524 0.95 median hdci
Conclusions:
- 89.602% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 63.369% and 92.145%
Family: negbinomial
Links: mu = log; shape = identity
Formula: y ~ scale(x)
Data: dat4 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.93 0.16 1.62 2.24 1.00 2267 2346
scalex 0.84 0.16 0.53 1.19 1.00 2500 2288
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 49.75 58.60 3.05 202.46 1.00 2560 2322
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.926 and we are 95% confident that the true value is between 1.622 and 2.238. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) 0.842 units and we are 95% confident that this change is between 0.526 and 1.193
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat4c.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 8 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.92e+ 0 1.61e+ 0 2.23 0.999 2400 2267. 2346.
2 b_scalex 8.38e- 1 5.09e- 1 1.15 1.00 2400 2500. 2288.
3 shape 3.00e+ 1 8.68e- 1 164. 1.00 2400 2561. 2322.
4 prior_Intercept 2.06e+ 0 1.67e- 1 3.86 1.00 2400 2509. 2456.
5 prior_b -5.53e- 2 -4.52e+ 0 4.72 1.00 2400 2406. 2234.
6 prior_shape 1.39e-29 2.99e-300 0.448 1.00 2400 2263. 2369.
7 lprior -1.08e+ 1 -1.40e+ 1 -7.98 1.00 2400 2555. 2328.
8 lp__ -3.19e+ 1 -3.49e+ 1 -30.7 1.00 2400 2368. 2278.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.925 and we are 95% confident that the true value is between 1.614 and 2.227. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)0.838 units and we are 95% confident that this change is between 0.509 and 1.155
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat4c.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 6.85 4.84 9.06 1.00 2400 2274. 2344.
2 b_scalex 2.31 1.65 3.16 1.00 2400 2501. 2255.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 6.855 and we are 95% confident that the true value is between 5.022 and 9.272. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) a factor of 2.311 units and we are 95% confident that this change is between 1.663 and 3.173. This represents a ((value -1) * 100) 131.1% increase in \(y\) per unit increase in \(x\).
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.
[1] "b_Intercept" "b_scalex" "shape" "prior_Intercept"
[5] "prior_b" "prior_shape" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
dat4c.brm2 |>
gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.8959637 0.6455751 0.9214557 0.95 median hdci
Conclusions:
- 89.596% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 64.558% and 92.146%
Family: binomial
Links: mu = logit
Formula: y | trials(1) ~ x
Data: dat5 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -5.72 3.24 -13.80 -0.92 1.00 2311 2184
x 1.13 0.59 0.28 2.57 1.00 2289 2216
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\), the expected value of \(y\) is -5.724 and we are 95% confident that the true value is between -13.797 and -0.916. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 1.13 units and we are 95% confident that this change is between 0.284 and 2.569
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat5a.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 6 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept -5.14 -12.2 -0.0456 1.00 2400 2311. 2184.
2 b_x 1.02 0.253 2.45 0.999 2400 2289. 2216.
3 prior_Intercept 0.0111 -2.00 1.86 1.00 2400 2223. 2289.
4 prior_b 0.0158 -3.02 3.20 1.00 2400 2373. 1947.
5 lprior -2.85 -4.74 -1.92 1.00 2400 2361. 2388.
6 lp__ -6.07 -8.56 -5.26 1.00 2400 2253. 2188.
The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 0.011 and we are 95% confident that the true value is between -12.204 and -0.046. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 1.02 units and we are 95% confident that this change is between 0.253 and 2.451
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat5a.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.00583 1.70e-10 0.223 1.00 2400 2307. 2177.
2 b_x 2.77 9.45e- 1 9.60 1.00 2400 2283. 2209.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 0.006 and we are 95% confident that the true value is between 0 and 0.955. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 2.773 and we are 95% confident that this change is between 1.288 and 11.595. This represents a (value -1) * 100 177.3 % increase in \(y\) per unit increase in \(x\).
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.
dat5a.brm2 |>
gather_draws(b_Intercept, b_x) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.6247095 0.2347775 0.6971989 0.95 median hdci
Conclusions:
- 62.471% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 23.478% and 69.72%
Family: binomial
Links: mu = logit
Formula: y | trials(1) ~ scale(x, scale = FALSE)
Data: dat5 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.52 0.72 -0.85 1.93 1.00 2374 2178
scalexscaleEQFALSE 1.13 0.57 0.30 2.51 1.00 2500 2232
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\), the expected value of \(y\) is 0.516 and we are 95% confident that the true value is between -0.854 and 1.93. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 1.131 units and we are 95% confident that this change is between 0.301 and 2.51
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat5b.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 6 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.519 -0.863 1.90 0.999 2400 2374. 2178.
2 b_scalexscaleEQFALSE 1.04 0.249 2.33 1.00 2400 2499. 2232.
3 prior_Intercept -0.0224 -1.99 1.84 1.00 2400 2306. 2257.
4 prior_b 0.0387 -2.97 3.32 1.00 2400 2171. 2205.
5 lprior -2.86 -4.68 -1.95 1.00 2400 2505. 2042.
6 lp__ -6.04 -8.50 -5.26 1.00 2400 2428. 2327.
The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is -0.022 and we are 95% confident that the true value is between -0.863 and 1.898. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 1.041 units and we are 95% confident that this change is between 0.249 and 2.332
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat5b.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.68 0.215 5.54 1.00 2400 2370. 2171.
2 b_scalexscaleEQFALSE 2.83 1.06 9.25 1.00 2400 2489. 2183.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 1.681 and we are 95% confident that the true value is between 0.422 and 6.674. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 2.832 and we are 95% confident that this change is between 1.282 and 10.299. This represents a (value -1) * 100 183.2 % increase in \(y\) per unit increase in \(x\).
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.
dat5b.brm2 |>
gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.6256043 0.2779363 0.6936146 0.95 median hdci
Conclusions:
- 62.56% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 27.794% and 69.361%
Family: binomial
Links: mu = logit
Formula: y | trials(1) ~ scale(x)
Data: dat5 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.43 0.65 -0.77 1.76 1.00 2276 2122
scalex 2.06 1.27 0.22 5.09 1.00 2319 2114
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.435 and we are 95% confident that the true value is between -0.772 and 1.76. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) 2.063 units and we are 95% confident that this change is between 0.216 and 5.088
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat5c.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 6 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.427 -0.737 1.80 1.00 2400 2276. 2122.
2 b_scalex 1.84 -0.127 4.64 1.00 2400 2319. 2114.
3 prior_Intercept -0.0229 -1.92 2.04 1.00 2400 2480. 2417.
4 prior_b 0.00783 -3.59 2.79 1.00 2400 2184. 2214.
5 lprior -3.70 -6.56 -1.93 0.999 2400 2345. 2120.
6 lp__ -7.38 -10.0 -6.56 1.00 2400 2221. 2135.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.427 and we are 95% confident that the true value is between -0.737 and 1.795. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)1.839 units and we are 95% confident that this change is between -0.127 and 4.639
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat5c.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.53 0.212 4.69 1.00 2400 2252. 2089.
2 b_scalex 6.29 0.332 82.1 1.00 2400 2305. 2108.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.532 and we are 95% confident that the true value is between 0.478 and 6.022. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) a factor of 6.289 units and we are 95% confident that this change is between 0.881 and 103.433. This represents a ((value -1) * 100) 528.9% increase in \(y\) per unit increase in \(x\).
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.
[1] "b_Intercept" "b_scalex" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
dat5c.brm2 |>
gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.4896412 0.03609659 0.6705246 0.95 median hdci
Conclusions:
- 48.964% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 3.61% and 67.052%
Family: binomial
Links: mu = logit
Formula: count | trials(total) ~ x
Data: dat6 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -3.29 0.95 -5.27 -1.55 1.00 2463 2287
x 0.65 0.17 0.35 1.01 1.00 2431 2452
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\), the expected value of \(y\) is -3.29 and we are 95% confident that the true value is between -5.271 and -1.554. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.653 units and we are 95% confident that this change is between 0.35 and 1.015
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat6a.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 6 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept -3.24 -5.10 -1.42 1.00 2400 2463. 2287.
2 b_x 0.641 0.330 0.989 1.00 2400 2430. 2452.
3 prior_Intercept -0.228 -0.872 0.362 1.00 2400 2392. 2370.
4 prior_b 0.0107 -1.51 1.54 1.00 2400 2466. 2293.
5 lprior -2.25 -5.12 -0.535 1.00 2400 2123. 1984.
6 lp__ -14.4 -16.9 -13.7 1.00 2400 2250. 2287.
The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is -0.228 and we are 95% confident that the true value is between -5.103 and -1.424. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.641 units and we are 95% confident that this change is between 0.33 and 0.989
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat6a.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.0391 0.000904 0.156 1.00 2400 2451. 2279.
2 b_x 1.90 1.35 2.64 1.00 2400 2421. 2445.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 0.039 and we are 95% confident that the true value is between 0.006 and 0.241. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.898 and we are 95% confident that this change is between 1.391 and 2.688. This represents a (value -1) * 100 89.8 % increase in \(y\) per unit increase in \(x\).
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.
dat6a.brm2 |>
gather_draws(b_Intercept, b_x) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.7601021 0.5664887 0.8230451 0.95 median hdci
Conclusions:
- 76.01% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 56.649% and 82.305%
Family: binomial
Links: mu = logit
Formula: count | trials(total) ~ scale(x, scale = FALSE)
Data: dat6 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.31 0.25 -0.16 0.80 1.00 2332 2304
scalexscaleEQFALSE 0.65 0.16 0.36 1.01 1.00 1869 2325
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\), the expected value of \(y\) is 0.311 and we are 95% confident that the true value is between -0.164 and 0.796. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.654 units and we are 95% confident that this change is between 0.364 and 1.011
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat6b.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 6 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.305 -0.171 0.786 1.00 2400 2334. 2304.
2 b_scalexscaleEQFALSE 0.642 0.363 1.01 1.00 2400 1870. 2325.
3 prior_Intercept -0.221 -0.898 0.412 1.00 2400 2463. 1931.
4 prior_b 0.0186 -1.74 1.65 1.00 2400 2314. 2331.
5 lprior -2.32 -5.36 -0.538 1.00 2400 2221. 2234.
6 lp__ -14.4 -16.6 -13.7 1.00 2400 2330. 1926.
The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is -0.221 and we are 95% confident that the true value is between -0.171 and 0.786. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.642 units and we are 95% confident that this change is between 0.363 and 1.006
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat6b.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.36 0.809 2.14 1.00 2400 2261. 2263.
2 b_scalexscaleEQFALSE 1.90 1.35 2.62 1.00 2400 1801. 2323.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\), the expected value of \(y\) is 1.357 and we are 95% confident that the true value is between 0.843 and 2.195. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.901 and we are 95% confident that this change is between 1.438 and 2.734. This represents a (value -1) * 100 90.1 % increase in \(y\) per unit increase in \(x\).
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.
dat6b.brm2 |>
gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.76149 0.5781493 0.8210215 0.95 median hdci
Conclusions:
- 76.149% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 57.815% and 82.102%
Family: binomial
Links: mu = logit
Formula: count | trials(total) ~ scale(x)
Data: dat6 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.28 0.25 -0.20 0.77 1.00 2456 2500
scalex 1.62 0.50 0.75 2.67 1.00 2502 2145
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.283 and we are 95% confident that the true value is between -0.204 and 0.774. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) 1.624 units and we are 95% confident that this change is between 0.754 and 2.674
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat6c.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)# A tibble: 6 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.281 -0.184 0.784 0.999 2400 2456. 2500.
2 b_scalex 1.60 0.673 2.55 1.00 2400 2503. 2145.
3 prior_Intercept -0.220 -0.814 0.439 1.00 2400 2399. 2364.
4 prior_b -0.0201 -0.871 1.04 1.00 2400 2391. 2454.
5 lprior -5.30 -8.92 -2.10 1.00 2400 2629. 2395.
6 lp__ -17.9 -20.2 -17.1 1.00 2400 2330. 2313.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.281 and we are 95% confident that the true value is between -0.184 and 0.784. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)1.605 units and we are 95% confident that this change is between 0.673 and 2.548
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select() with a regex (regular expression) to match only the columns that:
- start with (
^) “b_” followed by any amount of (*) any character (.) - start with (
^) “sigma”
It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat6c.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.32 0.732 2.03 1.00 2400 2461. 2491.
2 b_scalex 4.98 1.82 12.2 1.00 2400 2459. 2121.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhatvalues for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.324 and we are 95% confident that the true value is between 0.832 and 2.191. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) a factor of 4.975 units and we are 95% confident that this change is between 1.961 and 12.786. This represents a ((value -1) * 100) 397.5% increase in \(y\) per unit increase in \(x\).
The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.
[1] "b_Intercept" "b_scalex" "prior_Intercept" "prior_b"
[5] "lprior" "lp__" "accept_stat__" "stepsize__"
[9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
dat6c.brm2 |>
gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist package.
y ymin ymax .width .point .interval
1 0.708533 0.3843623 0.8229809 0.95 median hdci
Conclusions:
- 70.853% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 38.436% and 82.298%
12 Predictions
Whilst linear models are useful for estimating effects (relative differences), because they are low dimensional (only focus on a small number of covariates) they are not good at absolute predictions. Nevertheless, predicting values from linear models provides the basis for investigating/estimating additional effects and generating various graphics to visualise the estimates.
There are a large number of candidate routines for performing prediction. We will go through some of these. It is worth noting that in this context prediction is technically the act of estimating what we expect to get if we were to collect a single new observation from a particular population (e.g. a specific level of fertilizer concentration). Often this is not what we want. Often we want the fitted values - estimates of what we expect to get if we were to collect multiple new observations and average them.
So while fitted values represent the expected underlying processes occurring in the system, predicted values represent our expectations from sampling from such processes.
| Package | Function | Description | Summarise with |
|---|---|---|---|
emmeans |
emmeans |
Estimated marginal means from which posteriors can be drawn (via tidy_draws or gather_emmeans_draws()) |
median_hdci() |
rstantools |
posterior_predict |
Draw from the posterior of a prediction (includes sigma) - predicts single observations | summarise_draws() |
rstantools |
posterior_linpred |
Draw from the posterior of the fitted values (on the link scale) - predicts average observations | summarise_draws() |
rstantools |
posterior_epred |
Draw from the posterior of the fitted values (on the response scale) - predicts average observations | summarise_draws() |
tidybayes |
predicted_draws |
Extract the posterior of prediction values | median_hdci() |
tidybayes |
epred_draws |
Extract the posterior of expected values | median_hdci() |
tidybayes |
fitted_draws |
median_hdci() |
|
tidybayes |
add_predicted_draws |
Adds draws from the posterior of predictions to a data frame (of prediction data) | median_hdci() |
tidybayes |
add_fitted_draws |
Adds draws from the posterior of fitted values to a data frame (of prediction data) | median_hdci() |
For simple models prediction is essentially taking the model formula complete with parameter (coefficient) estimates and solving for new values of the predictor. To explore this, we will use the fitted model to predict Yield for a Fertilizer concentration of 110.
We will therefore start by establishing this prediction domain as a data frame to use across all of the prediction routines.
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 0.1109 -2.35 2.85
5.0 -0.0474 -4.82 5.08
Point estimate displayed: median
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat1a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.111 -2.35 2.85 0.95 median hdci
2 5 -0.0474 -4.82 5.08 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.111
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.047
- 95% HPD intervals also given
dat1a.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0853 -3.48 3.52
2 ...2 -0.0277 -5.71 5.57
# Or for even more control and ability to add other summaries
dat1a.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 0.110 -3.55 3.67
2 5 2 .prediction -0.0849 -5.53 5.42
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.047
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.07
- 95% HPD intervals also given
dat1a.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.111 -2.35 2.85
2 ...2 -0.0474 -4.82 5.08
# Or for even more control and ability to add other summaries
dat1a.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 0.111 -2.35 2.85
2 5 2 .epred -0.0474 -4.82 5.08
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.111
- the fitted mean \(y\) associated with an \(x\) of 5 is -0.047
- 95% HPD intervals also given
dat1a.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.111 -2.35 2.85
2 ...2 -0.0474 -4.82 5.08
# Or for even more control and ability to add other summaries
dat1a.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 0.111 -2.35 2.85
2 5 2 .linpred -0.0474 -4.82 5.08
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.111
- the fitted mean \(y\) associated with an \(x\) of 5 is -0.047
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 0.0698 -2.29 2.72
5.0 -0.1310 -4.46 5.02
Point estimate displayed: median
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat1b.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.0698 -2.29 2.72 0.95 median hdci
2 5 -0.131 -4.46 5.02 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.07
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.131
- 95% HPD intervals also given
dat1b.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0727 -3.44 3.31
2 ...2 -0.132 -5.52 5.25
# Or for even more control and ability to add other summaries
dat1b.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 0.0527 -3.35 3.36
2 5 2 .prediction -0.124 -5.39 5.31
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.092
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.12
- 95% HPD intervals also given
dat1b.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0698 -2.29 2.72
2 ...2 -0.131 -4.46 5.02
# Or for even more control and ability to add other summaries
dat1b.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 0.0698 -2.29 2.72
2 5 2 .epred -0.131 -4.46 5.02
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.07
- the fitted mean \(y\) associated with an \(x\) of 5 is -0.131
- 95% HPD intervals also given
y ymin ymax .width .point .interval
1 -0.003159382 -4.074285 3.68445 0.95 median hdci
# Or for even more control and ability to add other summaries
dat1b.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 0.0698 -2.29 2.72
2 5 2 .linpred -0.131 -4.46 5.02
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is -4.074
- the fitted mean \(y\) associated with an \(x\) of 5 is NA
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 0.0295 -2.67 2.60
5.0 -0.2027 -5.10 4.65
Point estimate displayed: median
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat1c.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.0295 -2.62 2.66 0.95 median hdci
2 5 -0.203 -5.10 4.65 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.029
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.203
- 95% HPD intervals also given
dat1c.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0134 -3.24 3.76
2 ...2 -0.196 -6.17 4.91
# Or for even more control and ability to add other summaries
dat1c.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 0.0459 -3.51 3.49
2 5 2 .prediction -0.218 -5.36 5.31
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.006
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.263
- 95% HPD intervals also given
dat1c.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0295 -2.67 2.60
2 ...2 -0.203 -5.10 4.65
# Or for even more control and ability to add other summaries
dat1c.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 0.0295 -2.67 2.60
2 5 2 .epred -0.203 -5.10 4.65
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.029
- the fitted mean \(y\) associated with an \(x\) of 5 is -0.203
- 95% HPD intervals also given
y ymin ymax .width .point .interval
1 -0.06446552 -4.31072 3.768188 0.95 median hdci
# Or for even more control and ability to add other summaries
dat1c.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 0.0295 -2.67 2.60
2 5 2 .linpred -0.203 -5.10 4.65
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is -4.311
- the fitted mean \(y\) associated with an \(x\) of 5 is NA
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is “control”, “medium” and “high”
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
control 20.8 16.71 24.3
medium 20.0 16.21 24.0
high 12.1 7.74 16.3
Point estimate displayed: median
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat2a.brm2 |>
emmeans(~x) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 3 × 7
x .value .lower .upper .width .point .interval
<fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 control 20.8 16.7 24.3 0.95 median hdci
2 medium 20.0 16.2 24.0 0.95 median hdci
3 high 12.1 8.14 16.7 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.787
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 19.982
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 12.055
- 95% HPD intervals also given
dat2a.brm2 |>
posterior_predict(newdata = data.frame(x =
c("control", "medium", "high"))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 3 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 20.8 10.6 30.3
2 ...2 20.1 10.3 30.4
3 ...3 12.0 2.28 23.0
# Or for even more control and ability to add other summaries
dat2a.brm2 |>
add_predicted_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 3 × 6
# Groups: x, .row [3]
x .row variable median lower upper
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 control 1 .prediction 20.7 11.2 31.0
2 high 3 .prediction 11.9 2.00 21.9
3 medium 2 .prediction 20.0 10.0 30.1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.787
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 20.05
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 12.145
- 95% HPD intervals also given
dat2a.brm2 |>
posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 3 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 20.8 16.7 24.3
2 ...2 20.0 16.2 24.0
3 ...3 12.1 7.74 16.3
# Or for even more control and ability to add other summaries
dat2a.brm2 |>
add_epred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 3 × 6
# Groups: x, .row [3]
x .row variable median lower upper
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 control 1 .epred 20.8 16.7 24.3
2 high 3 .epred 12.1 7.74 16.3
3 medium 2 .epred 20.0 16.2 24.0
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.787
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 19.982
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 12.055
- 95% HPD intervals also given
dat2a.brm2 |>
posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |>
median_hdci() y ymin ymax .width .point .interval
1 19.13185 9.432886 24.2459 0.95 median hdci
# Or for even more control and ability to add other summaries
dat2a.brm2 |>
add_linpred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 3 × 6
# Groups: x, .row [3]
x .row variable median lower upper
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 control 1 .linpred 20.8 16.7 24.3
2 high 3 .linpred 12.1 7.74 16.3
3 medium 2 .linpred 20.0 16.2 24.0
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 9.433
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is NA
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is NA
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is “control”, “medium” and “high”
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
control 20.3 15.51 24.3
medium 18.7 14.40 22.8
high 10.9 6.55 15.2
Point estimate displayed: median
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat2b.brm2 |>
emmeans(~x) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 3 × 7
x .value .lower .upper .width .point .interval
<fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 control 20.3 15.5 24.4 0.95 median hdci
2 medium 18.7 14.4 22.8 0.95 median hdci
3 high 10.9 6.39 15.1 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.335
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 18.734
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 10.914
- 95% HPD intervals also given
dat2b.brm2 |>
posterior_predict(newdata = data.frame(x =
c("control", "medium", "high"))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 3 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 20.4 10.5 30.6
2 ...2 18.6 8.49 28.7
3 ...3 10.9 0.967 21.4
# Or for even more control and ability to add other summaries
dat2b.brm2 |>
add_predicted_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 3 × 6
# Groups: x, .row [3]
x .row variable median lower upper
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 control 1 .prediction 20.1 9.93 29.8
2 high 3 .prediction 11.0 0.958 21.2
3 medium 2 .prediction 18.9 9.11 29.7
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.295
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 18.749
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 10.944
- 95% HPD intervals also given
dat2b.brm2 |>
posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 3 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 20.3 15.5 24.3
2 ...2 18.7 14.4 22.8
3 ...3 10.9 6.55 15.2
# Or for even more control and ability to add other summaries
dat2b.brm2 |>
add_epred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 3 × 6
# Groups: x, .row [3]
x .row variable median lower upper
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 control 1 .epred 20.3 15.5 24.3
2 high 3 .epred 10.9 6.55 15.2
3 medium 2 .epred 18.7 14.4 22.8
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.335
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 18.734
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 10.914
- 95% HPD intervals also given
dat2b.brm2 |>
posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |>
median_hdci() y ymin ymax .width .point .interval
1 17.99514 7.956261 23.53212 0.95 median hdci
# Or for even more control and ability to add other summaries
dat2b.brm2 |>
add_linpred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 3 × 6
# Groups: x, .row [3]
x .row variable median lower upper
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 control 1 .linpred 20.3 15.5 24.3
2 high 3 .linpred 10.9 6.55 15.2
3 medium 2 .linpred 18.7 14.4 22.8
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 7.956
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is NA
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is NA
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x rate lower.HPD upper.HPD
2.5 2.45 1.40 3.72
5.0 5.79 4.12 7.52
Point estimate displayed: median
Results are back-transformed from the log scale
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat3a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
mutate(.value = exp(.value)) |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 2.45 1.40 3.72 0.95 median hdci
2 5 5.79 4.15 7.57 0.95 median hdci
# OR with yet more control over the way posteriors are summarised
dat3a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 5
# Groups: x [2]
x variable median lower upper
<dbl> <chr> <dbl> <dbl> <dbl>
1 2.5 .value 0.896 0.366 1.33
2 5 .value 1.76 1.45 2.05
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2.45
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 5.792
- 95% HPD intervals also given
dat3a.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2 0 6
2 ...2 6 1 10
# Or for even more control and ability to add other summaries
dat3a.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 2 0 6
2 5 2 .prediction 6 1 10
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
- 95% HPD intervals also given
dat3a.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.45 1.40 3.72
2 ...2 5.79 4.12 7.52
# Or for even more control and ability to add other summaries
dat3a.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 2.45 1.40 3.72
2 5 2 .epred 5.79 4.12 7.52
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.45
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.792
- 95% HPD intervals also given
dat3a.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
exp() |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.45 1.40 3.72
2 ...2 5.79 4.12 7.52
# Or for even more control and ability to add other summaries
dat3a.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = exp(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 2.45 1.40 3.72
2 5 2 .linpred 5.79 4.12 7.52
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.45
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.792
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 0.902 0.376 1.35
5.0 1.755 1.437 2.05
Point estimate displayed: median
Results are given on the log (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat3b.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.902 0.376 1.35 0.95 median hdci
2 5 1.76 1.44 2.05 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.902
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1.755
- 95% HPD intervals also given
dat3b.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2 0 5
2 ...2 6 0 10
# Or for even more control and ability to add other summaries
dat3b.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 2 0 6
2 5 2 .prediction 6 1 10
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
- 95% HPD intervals also given
dat3b.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.46 1.29 3.65
2 ...2 5.78 4.12 7.61
# Or for even more control and ability to add other summaries
dat3b.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 2.46 1.29 3.65
2 5 2 .epred 5.78 4.12 7.61
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.465
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.785
- 95% HPD intervals also given
dat3b.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
exp() |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.46 1.29 3.65
2 ...2 5.78 4.12 7.61
# Or for even more control and ability to add other summaries
dat3b.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = exp(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 2.46 1.29 3.65
2 5 2 .linpred 5.78 4.12 7.61
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.465
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.785
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 0.91 0.394 1.40
5.0 1.76 1.445 2.06
Point estimate displayed: median
Results are given on the log (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat3c.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.910 0.436 1.44 0.95 median hdci
2 5 1.76 1.44 2.06 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.91
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1.763
- 95% HPD intervals also given
dat3c.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2 0 6
2 ...2 6 1 11
# Or for even more control and ability to add other summaries
dat3c.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 2 0 6
2 5 2 .prediction 6 1 11
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 5
- 95% HPD intervals also given
dat3c.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.48 1.36 3.83
2 ...2 5.83 4.07 7.69
# Or for even more control and ability to add other summaries
dat3c.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 2.48 1.36 3.83
2 5 2 .epred 5.83 4.07 7.69
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.485
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.83
- 95% HPD intervals also given
dat3c.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
exp() |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.48 1.36 3.83
2 ...2 5.83 4.07 7.69
# Or for even more control and ability to add other summaries
dat3c.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = exp(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 2.48 1.36 3.83
2 5 2 .linpred 5.83 4.07 7.69
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.485
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.83
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x prob lower.HPD upper.HPD
2.5 2.95 1.59 4.65
5.0 5.97 4.11 8.03
Point estimate displayed: median
Results are back-transformed from the log scale
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat4a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
mutate(.value = exp(.value)) |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 2.95 1.61 4.67 0.95 median hdci
2 5 5.97 4.02 7.94 0.95 median hdci
# OR with yet more control over the way posteriors are summarised
dat4a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 5
# Groups: x [2]
x variable median lower upper
<dbl> <chr> <dbl> <dbl> <dbl>
1 2.5 .value 1.08 0.524 1.58
2 5 .value 1.79 1.41 2.08
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2.953
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 5.97
- 95% HPD intervals also given
dat4a.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 3 0 7
2 ...2 6 0 11
# Or for even more control and ability to add other summaries
dat4a.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 3 0 7
2 5 2 .prediction 6 0 11
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 3
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
- 95% HPD intervals also given
dat4a.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.95 1.59 4.65
2 ...2 5.97 4.11 8.03
# Or for even more control and ability to add other summaries
dat4a.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 2.95 1.59 4.65
2 5 2 .epred 5.97 4.11 8.03
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.953
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.97
- 95% HPD intervals also given
dat4a.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
exp() |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.95 1.59 4.65
2 ...2 5.97 4.11 8.03
# Or for even more control and ability to add other summaries
dat4a.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = exp(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 2.95 1.59 4.65
2 5 2 .linpred 5.97 4.11 8.03
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.953
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.97
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 1.09 0.56 1.58
5.0 1.79 1.46 2.10
Point estimate displayed: median
Results are given on the log (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat4b.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 1.09 0.566 1.59 0.95 median hdci
2 5 1.79 1.46 2.10 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 1.089
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1.787
- 95% HPD intervals also given
dat4b.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 3 0 7
2 ...2 6 1 12
# Or for even more control and ability to add other summaries
dat4b.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 3 0 7
2 5 2 .prediction 6 0 11
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 3
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
- 95% HPD intervals also given
dat4b.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.97 1.55 4.58
2 ...2 5.97 4.20 8.04
# Or for even more control and ability to add other summaries
dat4b.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 2.97 1.55 4.58
2 5 2 .epred 5.97 4.20 8.04
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.971
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.97
- 95% HPD intervals also given
dat4b.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
exp() |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.97 1.55 4.58
2 ...2 5.97 4.20 8.04
# Or for even more control and ability to add other summaries
dat4b.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = exp(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 2.97 1.55 4.58
2 5 2 .linpred 5.97 4.20 8.04
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.971
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.97
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 1.10 0.54 1.64
5.0 1.79 1.44 2.11
Point estimate displayed: median
Results are given on the log (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat4c.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 1.10 0.540 1.64 0.95 median hdci
2 5 1.79 1.44 2.11 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 1.097
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1.789
- 95% HPD intervals also given
dat4c.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 3 0 7
2 ...2 6 1 12
# Or for even more control and ability to add other summaries
dat4c.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 3 0 7
2 5 2 .prediction 6 1 12
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 3
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
- 95% HPD intervals also given
dat4c.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.99 1.57 4.76
2 ...2 5.98 4.18 8.20
# Or for even more control and ability to add other summaries
dat4c.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 2.99 1.57 4.76
2 5 2 .epred 5.98 4.18 8.20
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.994
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.985
- 95% HPD intervals also given
dat4c.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
exp() |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.99 1.57 4.76
2 ...2 5.98 4.18 8.20
# Or for even more control and ability to add other summaries
dat4c.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = exp(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 2.99 1.57 4.76
2 5 2 .linpred 5.98 4.18 8.20
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.994
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.985
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x prob lower.HPD upper.HPD
2.5 0.0667 4.30e-06 0.387
5.0 0.4888 1.71e-01 0.778
Point estimate displayed: median
Results are back-transformed from the logit scale
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat5a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
mutate(.value = plogis(.value)) |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.0667 0.00000430 0.386 0.95 median hdci
2 5 0.489 0.171 0.778 0.95 median hdci
# OR with yet more control over the way posteriors are summarised
dat5a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 5
# Groups: x [2]
x variable median lower upper
<dbl> <chr> <dbl> <dbl> <dbl>
1 2.5 .value -2.64 -6.72 0.180
2 5 .value -0.0448 -1.58 1.25
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.067
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0.489
- 95% HPD intervals also given
dat5a.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0 0 1
2 ...2 0 0 1
# Or for even more control and ability to add other summaries
dat5a.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 0 0 1
2 5 2 .prediction 0 0 1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0
- 95% HPD intervals also given
dat5a.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0667 0.00000430 0.387
2 ...2 0.489 0.171 0.778
# Or for even more control and ability to add other summaries
dat5a.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 0.0667 0.00000430 0.387
2 5 2 .epred 0.489 0.171 0.778
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.067
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.489
- 95% HPD intervals also given
dat5a.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
plogis() |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0667 0.00000430 0.387
2 ...2 0.489 0.171 0.778
# Or for even more control and ability to add other summaries
dat5a.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = plogis(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 0.0667 0.00000430 0.387
2 5 2 .linpred 0.489 0.171 0.778
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.067
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.489
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 -2.6388 -6.56 0.0827
5.0 -0.0342 -1.50 1.3946
Point estimate displayed: median
Results are given on the logit (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat5b.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 -2.64 -6.44 0.223 0.95 median hdci
2 5 -0.0342 -1.47 1.44 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is -2.639
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.034
- 95% HPD intervals also given
dat5b.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0 0 1
2 ...2 1 0 1
# Or for even more control and ability to add other summaries
dat5b.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 0 0 1
2 5 2 .prediction 0 0 1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0
- 95% HPD intervals also given
dat5b.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0667 0.00000122 0.372
2 ...2 0.491 0.168 0.786
# Or for even more control and ability to add other summaries
dat5b.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 0.0667 0.00000122 0.372
2 5 2 .epred 0.491 0.168 0.786
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.067
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.491
- 95% HPD intervals also given
dat5b.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
plogis() |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0667 0.00000122 0.372
2 ...2 0.491 0.168 0.786
# Or for even more control and ability to add other summaries
dat5b.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = plogis(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 0.0667 0.00000122 0.372
2 5 2 .linpred 0.491 0.168 0.786
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.067
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.491
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 -1.448 -4.12 0.675
5.0 0.114 -1.09 1.471
Point estimate displayed: median
Results are given on the logit (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat5c.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 -1.45 -4.12 0.675 0.95 median hdci
2 5 0.114 -1.09 1.47 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is -1.448
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0.114
- 95% HPD intervals also given
dat5c.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0 0 1
2 ...2 1 0 1
# Or for even more control and ability to add other summaries
dat5c.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 0 0 1
2 5 2 .prediction 1 0 1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1
- 95% HPD intervals also given
dat5c.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.190 0.0000295 0.549
2 ...2 0.529 0.252 0.813
# Or for even more control and ability to add other summaries
dat5c.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 0.190 0.0000295 0.549
2 5 2 .epred 0.529 0.252 0.813
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.19
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.529
- 95% HPD intervals also given
dat5c.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
plogis() |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.190 0.0000295 0.549
2 ...2 0.529 0.252 0.813
# Or for even more control and ability to add other summaries
dat5c.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = plogis(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 0.190 0.0000295 0.549
2 5 2 .linpred 0.529 0.252 0.813
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.19
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.529
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x prob lower.HPD upper.HPD
2.5 0.163 0.0409 0.313
5.0 0.494 0.3729 0.617
Point estimate displayed: median
Results are back-transformed from the logit scale
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat6a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
mutate(.value = plogis(.value)) |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.163 0.0478 0.321 0.95 median hdci
2 5 0.494 0.373 0.617 0.95 median hdci
# OR with yet more control over the way posteriors are summarised
dat6a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 5
# Groups: x [2]
x variable median lower upper
<dbl> <chr> <dbl> <dbl> <dbl>
1 2.5 .value -1.64 -2.73 -0.604
2 5 .value -0.0247 -0.520 0.477
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.163
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0.494
- 95% HPD intervals also given
dat6a.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0 0 1
2 ...2 0 0 1
# Or for even more control and ability to add other summaries
dat6a.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .prediction 0 0 1
2 5 1 2 .prediction 0 0 1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0
- 95% HPD intervals also given
dat6a.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.163 0.0409 0.313
2 ...2 0.494 0.373 0.617
# Or for even more control and ability to add other summaries
dat6a.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .epred 0.163 0.0409 0.313
2 5 1 2 .epred 0.494 0.373 0.617
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.163
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.494
- 95% HPD intervals also given
dat6a.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
plogis() |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.163 0.0409 0.313
2 ...2 0.494 0.373 0.617
# Or for even more control and ability to add other summaries
dat6a.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
mutate(.linpred = plogis(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .linpred 0.163 0.0409 0.313
2 5 1 2 .linpred 0.494 0.373 0.617
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.163
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.494
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 -1.621 -2.685 -0.654
5.0 -0.018 -0.489 0.500
Point estimate displayed: median
Results are given on the logit (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat6b.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 -1.62 -2.69 -0.654 0.95 median hdci
2 5 -0.0180 -0.497 0.499 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is -1.621
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.018
- 95% HPD intervals also given
dat6b.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0 0 1
2 ...2 0 0 1
# Or for even more control and ability to add other summaries
dat6b.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .prediction 0 0 1
2 5 1 2 .prediction 1 0 1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1
- 95% HPD intervals also given
dat6b.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.165 0.0485 0.319
2 ...2 0.495 0.380 0.622
# Or for even more control and ability to add other summaries
dat6b.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .epred 0.165 0.0485 0.319
2 5 1 2 .epred 0.495 0.380 0.622
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.165
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.495
- 95% HPD intervals also given
dat6b.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
plogis() |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.165 0.0485 0.319
2 ...2 0.495 0.380 0.622
# Or for even more control and ability to add other summaries
dat6b.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
mutate(.linpred = plogis(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .linpred 0.165 0.0485 0.319
2 5 1 2 .linpred 0.495 0.380 0.622
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.165
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.495
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 -1.3094 -2.34 -0.334
5.0 0.0134 -0.48 0.485
Point estimate displayed: median
Results are given on the logit (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat6c.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 -1.31 -2.34 -0.334 0.95 median hdci
2 5 0.0134 -0.480 0.485 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is -1.309
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0.013
- 95% HPD intervals also given
dat6c.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0 0 1
2 ...2 1 0 1
# Or for even more control and ability to add other summaries
dat6c.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .prediction 0 0 1
2 5 1 2 .prediction 1 0 1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1
- 95% HPD intervals also given
dat6c.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.213 0.0709 0.392
2 ...2 0.503 0.382 0.619
# Or for even more control and ability to add other summaries
dat6c.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .epred 0.213 0.0709 0.392
2 5 1 2 .epred 0.503 0.382 0.619
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.213
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.503
- 95% HPD intervals also given
dat6c.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
plogis() |>
summarise_draws(median, HDInterval::hdi)# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.213 0.0709 0.392
2 ...2 0.503 0.382 0.619
# Or for even more control and ability to add other summaries
dat6c.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
mutate(.linpred = plogis(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .linpred 0.213 0.0709 0.392
2 5 1 2 .linpred 0.503 0.382 0.619
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.213
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.503
- 95% HPD intervals also given
13 Further investigations
Since we have the entire posterior, we are able to make probability statements. We simply count up the number of MCMC sample draws that satisfy a condition (e.g represent a slope greater than 0) and then divide by the total number of MCMC samples.
Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:
- a change in \(x\) is associated with an increase in \(y\)
- a doubling of \(x\) (from 2.5 to 5) is associated with an increase in \(y\) of > 50%
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (x) > 0 -0.07 0.47 -0.84 0.68 0.77 0.43
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
- the parameter (
b_x) minus 0 is -0.074 Evid.Ratio: the ratio of evidence for the hypothesis vs the evidence against it. In this case, the evidence ratio is 0.766 -Infis because the divisor was 0 (no evidence against the hypothesis).Post.Prob: the probability of the hypothesis is 0.434- there is very high evidence for this hypothesis
Alternatively, we could use gather_draws to achieve a similar outcome.
In the following, in addition to median and HPD intervals, we will calculate the probability that the slope (b_x) is greater than 0. To calculate such a probability, we could simply count up the number of posterior b_x values that are greater than zero and then divide by the total number of posterior b_x values. In R, we could do this as sum(b_x > 0)/length(b_x) (where b_x > 0 will return either a 1 for each case it is true and a 0 when it is false, and thus summing is like counting). Dividing a sum by its length equates to a mean and thus we can achieve the probability by calcualing the mean of b_x > 0.
# A tibble: 1 × 6
# Groups: .variable [1]
.variable variable median lower upper P
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 b_x .value -0.0611 -1.02 0.836 0.434
The summarise_draws() function expects a set of one or more summary or diagnostic functions (such as median etc). These can be supplied either as the name of the function (as in the case for median in the example above) or if more arguments or information is required by the function, the function can be written out in full. in this case, the function must be proceeded with a ~ and the variable is denoted a . (such as in P = ~mean(. > 0) above).
Conclusions:
- the parameter (
b_x) minus 0 is -0.061 P: the probability of the hypothesis is 0.766- there is very high evidence for this hypothesis
dat1a.brm2 |>
emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
ungroup() |>
group_by(.draw) |>
summarise(ES = 100 * diff(.value) / .value[1]) |>
hypothesis("ES > 50")Hypothesis Tests for class :
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (ES)-(50) > 0 48.31 3054.91 -181.34 233.85 2.7 0.73
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
- the difference between the percentage change in estimated \(y\) as \(x\) increases from 2.5 to 5 and 50% is 48.308
- the evidence ratio in support of the hypothesis that the percentage change exceeds 50% is 2.704
- the probability that the change in \(y\) exceeds 50% is 0.73
- the evidence for such a change is very weak
dat1a.brm2 |>
emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
ungroup() |>
group_by(.draw) |>
summarise(ES = 100 * diff(.value) / .value[1]) |>
summarise_draws(
mean, median,
HDInterval::hdi,
P = ~ mean(. > 50)
)# A tibble: 1 × 6
variable mean median lower upper P
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ES 98.3 84.1 -344. 429. 0.73
Warning: `as_data_frame()` was deprecated in tibble 2.0.0.
ℹ Please use `as_tibble()` (with slightly different semantics) to convert to a
tibble, or `as.data.frame()` to convert to a data frame.
Conclusions:
- the percentage change in estimated \(y\) as \(x\) increases from 2.5 to 5 is 98.308
- the probability that the change in \(y\) exceeds 50% is 0.73
- the evidence for such a change is very weak
The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0).
The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is “practically equivalent” to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. Kruschke (2018) argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on Cohen (1988). Kruschke (2018) also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of “null-hypothesis” testing in a Bayesian framework.
- if the HDI of the focal parameter falls completely outside the ROPE, there is strong evidence that there is an effect
- if the HDI of the focal parameter falls completely inside the ROPE, there is strong evidence that there is not an effect
- otherwise there is not clear evidence either way
ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a “non-significant” result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart.
I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights.
## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat1a.brm2)
dat1a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI")# Test for Practical Equivalence
ROPE: [-0.09 0.09]
Parameter | H0 | inside ROPE | 95% HDI
--------------------------------------------------
x | Undecided | 18.95 % | [-1.03 0.83]
dat1a.brm2 |>
bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") |>
plot()Picking joint bandwidth of 0.0735
Conclusions:
- the percentage of the HPD for the slope that is inside the ROPE is 0.189
- there is strong evidence for an effect
OR using the rope function.
## Calculate ROPE range manually
dat1a.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")# Proportion of samples inside the ROPE [-0.09, 0.09]:
Parameter | inside ROPE
-----------------------
x | 18.94 %
The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect)
Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:
- a change in \(x\) is associated with an increase in \(y\)
- a doubling of \(x\) (from 2.5 to 5) is associated with an increase in \(y\) of > 50%
[1] "b_Intercept" "b_scalexscaleEQFALSE" "sigma"
[4] "prior_Intercept" "prior_b" "prior_sigma"
[7] "lprior" "lp__" "accept_stat__"
[10] "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (scalexscaleEQFALSE) > 0 -0.09 0.45 -0.81 0.61 0.74
Post.Prob Star
1 0.43
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
- the parameter (
b_scalexscaledEQFALSE) minus 0 is -0.085 Evid.Ratio: the ratio of evidence for the hypothesis vs the evidence against it. In this case, the evidence ratio is 0.743 -Infis because the divisor was 0 (no evidence against the hypothesis).Post.Prob: the probability of the hypothesis is 0.426- there is very high evidence for this hypothesis
Alternatively, we could use gather_draws to achieve a similar outcome.
In the following, in addition to median and HPD intervals, we will calculate the probability that the slope (b_x) is greater than 0. To calculate such a probability, we could simply count up the number of posterior b_x values that are greater than zero and then divide by the total number of posterior b_x values. In R, we could do this as sum(b_x > 0)/length(b_x) (where b_x > 0 will return either a 1 for each case it is true and a 0 when it is false, and thus summing is like counting). Dividing a sum by its length equates to a mean and thus we can achieve the probability by calcualing the mean of b_x > 0.
dat1b.brm2 |>
gather_draws(`b_.*x.*`, regex = TRUE) |>
summarise_draws(median,
HDInterval::hdi,
P = ~mean(. > 0)
)# A tibble: 1 × 6
# Groups: .variable [1]
.variable variable median lower upper P
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 b_scalexscaleEQFALSE .value -0.0808 -0.977 0.810 0.426
The summarise_draws() function expects a set of one or more summary or diagnostic functions (such as median etc). These can be supplied either as the name of the function (as in the case for median in the example above) or if more arguments or information is required by the function, the function can be written out in full. in this case, the function must be proceeded with a ~ and the variable is denoted a . (such as in P = ~mean(. > 0) above).
Conclusions:
- the parameter (
b_scalexscaleEQFALSE) minus 0 is -0.081 P: the probability of the hypothesis is 0.743- there is very high evidence for this hypothesis
dat1b.brm2 |>
emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
ungroup() |>
group_by(.draw) |>
summarise(ES = 100 * diff(.value) / .value[1]) |>
hypothesis("ES > 50")Hypothesis Tests for class :
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (ES)-(50) > 0 -7.34 975.03 -186.4 249.78 2.82 0.74
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
- the difference between the percentage change in estimated \(y\) as \(x\) increases from 2.5 to 5 and 50% is -7.339
- the evidence ratio in support of the hypothesis that the percentage change exceeds 50% is 2.822
- the probability that the change in \(y\) exceeds 50% is 0.738
- the evidence for such a change is very weak
dat1b.brm2 |>
emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
ungroup() |>
group_by(.draw) |>
summarise(ES = 100 * diff(.value) / .value[1]) |>
summarise_draws(
mean, median,
HDInterval::hdi,
P = ~ mean(. > 50)
)# A tibble: 1 × 6
variable mean median lower upper P
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ES 42.7 82.4 -315. 456. 0.738
Conclusions:
- the percentage change in estimated \(y\) as \(x\) increases from 2.5 to 5 is 42.661
- the probability that the change in \(y\) exceeds 50% is 0.738
- the evidence for such a change is very weak
The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0).
The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is “practically equivalent” to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. Kruschke (2018) argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on Cohen (1988). Kruschke (2018) also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of “null-hypothesis” testing in a Bayesian framework.
- if the HDI of the focal parameter falls completely outside the ROPE, there is strong evidence that there is an effect
- if the HDI of the focal parameter falls completely inside the ROPE, there is strong evidence that there is not an effect
- otherwise there is not clear evidence either way
ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a “non-significant” result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart.
I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights.
## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat1b.brm2)
dat1b.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI")# Test for Practical Equivalence
ROPE: [-0.09 0.09]
Parameter | H0 | inside ROPE | 95% HDI
-----------------------------------------------------------
scalexscaleEQFALSE | Undecided | 17.81 % | [-0.97 0.81]
dat1b.brm2 |>
bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") |>
plot()Picking joint bandwidth of 0.0711
Conclusions:
- the percentage of the HPD for the slope that is inside the ROPE is 0.178
- there is strong evidence for an effect
OR using the rope function.
## Calculate ROPE range manually
dat1b.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")# Proportion of samples inside the ROPE [-0.09, 0.09]:
Parameter | inside ROPE
--------------------------------
scalexscaleEQFALSE | 17.80 %
The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect)
Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:
- a change in \(x\) is associated with an increase in \(y\)
- a doubling of \(x\) (from 2.5 to 5) is associated with an increase in \(y\) of > 50%
[1] "b_Intercept" "b_scalex" "sigma" "prior_Intercept"
[5] "prior_b" "prior_sigma" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (scalex) > 0 -0.08 0.39 -0.72 0.53 0.7 0.41
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
- the parameter (
b_scalex) minus 0 is -0.079 Evid.Ratio: the ratio of evidence for the hypothesis vs the evidence against it. In this case, the evidence ratio is 0.697 -Infis because the divisor was 0 (no evidence against the hypothesis).Post.Prob: the probability of the hypothesis is 0.411- there is very high evidence for this hypothesis
Alternatively, we could use gather_draws to achieve a similar outcome.
In the following, in addition to median and HPD intervals, we will calculate the probability that the slope (b_x) is greater than 0. To calculate such a probability, we could simply count up the number of posterior b_x values that are greater than zero and then divide by the total number of posterior b_x values. In R, we could do this as sum(b_x > 0)/length(b_x) (where b_x > 0 will return either a 1 for each case it is true and a 0 when it is false, and thus summing is like counting). Dividing a sum by its length equates to a mean and thus we can achieve the probability by calcualing the mean of b_x > 0.
dat1c.brm2 |>
gather_draws(`b_.*x`, regex = TRUE) |>
summarise_draws(median,
HDInterval::hdi,
P = ~mean(. > 0)
)# A tibble: 1 × 6
# Groups: .variable [1]
.variable variable median lower upper P
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 b_scalex .value -0.0765 -0.938 0.634 0.411
The summarise_draws() function expects a set of one or more summary or diagnostic functions (such as median etc). These can be supplied either as the name of the function (as in the case for median in the example above) or if more arguments or information is required by the function, the function can be written out in full. in this case, the function must be proceeded with a ~ and the variable is denoted a . (such as in P = ~mean(. > 0) above).
Conclusions:
- the parameter (
b_scalex) minus 0 is -0.076 P: the probability of the hypothesis is 0.697- there is very high evidence for this hypothesis
dat1c.brm2 |>
emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
ungroup() |>
group_by(.draw) |>
summarise(ES = 100 * diff(.value) / .value[1]) |>
hypothesis("ES > 50")Hypothesis Tests for class :
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (ES)-(50) > 0 -183.83 9672.3 -196.97 233.86 2.9 0.74
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
- the difference between the percentage change in estimated \(y\) as \(x\) increases from 2.5 to 5 and 50% is -183.827
- the evidence ratio in support of the hypothesis that the percentage change exceeds 50% is 2.902
- the probability that the change in \(y\) exceeds 50% is 0.744
- the evidence for such a change is very weak
dat1c.brm2 |>
emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
ungroup() |>
group_by(.draw) |>
summarise(ES = 100 * diff(.value) / .value[1]) |>
summarise_draws(
mean, median,
HDInterval::hdi,
P = ~ mean(. > 50)
)# A tibble: 1 × 6
variable mean median lower upper P
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ES -134. 84.8 -292. 542. 0.744
Conclusions:
- the percentage change in estimated \(y\) as \(x\) increases from 2.5 to 5 is -133.827
- the probability that the change in \(y\) exceeds 50% is 0.744
- the evidence for such a change is very weak
The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0).
The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is “practically equivalent” to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. Kruschke (2018) argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on Cohen (1988). Kruschke (2018) also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of “null-hypothesis” testing in a Bayesian framework.
- if the HDI of the focal parameter falls completely outside the ROPE, there is strong evidence that there is an effect
- if the HDI of the focal parameter falls completely inside the ROPE, there is strong evidence that there is not an effect
- otherwise there is not clear evidence either way
ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a “non-significant” result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart.
I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights.
## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat1c.brm2)
dat1c.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI")# Test for Practical Equivalence
ROPE: [-0.09 0.09]
Parameter | H0 | inside ROPE | 95% HDI
--------------------------------------------------
scalex | Undecided | 21.40 % | [-0.87 0.71]
dat1c.brm2 |>
bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") |>
plot()Picking joint bandwidth of 0.0616
Conclusions:
- the percentage of the HPD for the slope that is inside the ROPE is 0.214
- there is strong evidence for an effect
OR using the rope function.
## Calculate ROPE range manually
dat1c.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")# Proportion of samples inside the ROPE [-0.09, 0.09]:
Parameter | inside ROPE
-----------------------
scalex | 21.39 %
The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect)
Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:
- all pairwise comparisons (compare each level of \(x\) to each other
- define a specific set of contrasts that include comparing the average of medium and high treatments to the control treatment.
contrast estimate lower.HPD upper.HPD
control - medium 0.795 -4.69 6.03
control - high 8.733 2.56 14.75
medium - high 7.959 1.18 13.64
Point estimate displayed: median
HPD interval probability: 0.95
Or if we want the full posteriors… This option allows us to calculate exceedence probabilities. That is, we can calculate the proportion of contrast posteriors that exceed a specific value (as a hypothesis). In this case, we will calculate two exceedence probabilities:
- probability that the effect is negative (e.g. proportion of probabilities that are less than 0)
- probability that the effect is positive (e.g. proportion of probabilities that are greater than 0)
dat2a.brm2 |>
emmeans(~x) |>
pairs() |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(median,
HDInterval::hdi,
Pl = ~ mean(.x < 0),
Pg = ~ mean(.x > 0)
)# A tibble: 3 × 7
# Groups: contrast [3]
contrast variable median lower upper Pl Pg
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 control - high .value 8.73 2.56 14.8 0.00625 0.994
2 control - medium .value 0.795 -4.69 6.03 0.378 0.622
3 medium - high .value 7.96 1.18 13.6 0.0108 0.989
Conclusions:
- the difference in \(y\) between “control” and “medium” is 0.8, however there is no evidence of this effect (exceedence probability)
- the difference in \(y\) between “control” and “high” is 8.73, and there is very strong evidence for this effect
- the difference in \(y\) between “medium” and “high” is 7.96, and there is very strong evidence for this effect
It is also possible to express the magnitude of effect in percentage change. The trick is to put the emmeans parameters onto a logarithmic scale so that the pairwise comparisons (which are a subtraction) effectively are treated as divisions (due to log laws).
contrast ratio lower.HPD upper.HPD
control/medium 1.04 0.783 1.34
control/high 1.73 1.054 2.56
medium/high 1.66 1.047 2.51
Point estimate displayed: median
HPD interval probability: 0.95
The estimates are expressed as fractional changes. A “ratio” of 1 indicates parity, since if you multiply something by 1, it does not change. A value of 1.5, indicates a 50% increase and a value of 0.5 indicates a 50% decline.
To calculate percentage change from a fractional value, subtract 1 and multiply the result by 100. E.g.
If we get the full posteriors, we can also explore whether the change exceeds some ecologically important change (such as 20%)
dat2a.brm2 |>
emmeans(~x) |>
regrid(transform = "log") |>
pairs() |>
regrid() |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(median,
HDInterval::hdi,
Pl = ~ mean(.x < 1),
Pg = ~ mean(.x > 1),
Pl50 = ~ mean(.x < 0.8),
Pg50 = ~ mean(.x > 1.2)
)# A tibble: 3 × 9
# Groups: contrast [3]
contrast variable median lower upper Pl Pg Pl50 Pg50
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 control/high .value 1.73 1.05 2.56 0.00625 0.994 0 0.961
2 control/medium .value 1.04 0.783 1.34 0.378 0.622 0.0325 0.139
3 medium/high .value 1.66 1.05 2.51 0.0108 0.989 0.00125 0.937
Conclusions:
- the response \(y\) is 72.843% higher in the control group over the high group.
- there is strong evidence (P = 1) of the above
- there is also strong evidence (P = 1) that \(y\) is at least 20% higher in the control group over the high group
- the response \(y\) is 3.959% higher in the control group over the medium group.
- there is no evidence (P = 0.622) of the above
- there is no evidence (P = 0.139) that \(y\) is at least 20% higher in the control group over the medium group
- the response \(y\) is 66.107% higher in the medium group over the high group.
- there is strong evidence (P = 0.989) of the above
- there is no evidence (P = 0.937) that \(y\) is at least 20% higher in the medium group over the high group
cmat <- cbind(
"Contrast vs Medium/High" = c(1, -0.5, -0.5),
"Medium vs High" = c(0, 1, -1)
)
dat2a.brm2 |>
emmeans(~x) |>
contrast(method = list(x = cmat)) contrast estimate lower.HPD upper.HPD
x.Contrast vs Medium/High 4.75 0.128 9.89
x.Medium vs High 7.96 1.177 13.64
Point estimate displayed: median
HPD interval probability: 0.95
Or with full posteriors and exceedance probabilities…
dat2a.brm2 |>
emmeans(~x) |>
contrast(method = list(x = cmat)) |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(median,
HDInterval::hdi,
Pl = ~ mean(.x < 0),
Pg = ~ mean(.x > 0)
)# A tibble: 2 × 7
# Groups: contrast [2]
contrast variable median lower upper Pl Pg
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 x.Contrast vs Medium/High .value 4.75 0.128 9.89 0.0338 0.966
2 x.Medium vs High .value 7.96 1.18 13.6 0.0108 0.989
Conclusions:
- on average, \(y\) is 4.749 higher in the “control” group than the “mediun” and “high” groups.
- the evidence for this effect is very strong 0.966
We have already seen that there is no evidence of a difference in \(y\) between the “control” and “medium” groups. This could be because either there is not enough power to detect the difference or that the populations are not different. It would be nice to be able to gain some insights into which of these is most likely. And we can. If we establish the range of values that represent an insubstantial effect, we can then quantify the proportion of the posterior that falls inside this Region of Practical Equivalence (ROPE).
Conventionally, the ROPE represents within 10% - that is, if the effect is less than 10% change, then we might consider it insubstantial.
## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat2, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat2a.brm2)
dat2a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI")# Test for Practical Equivalence
ROPE: [-0.60 0.60]
Parameter | H0 | inside ROPE | 95% HDI
----------------------------------------------------
xmedium | Undecided | 18.46 % | [ -5.97 4.95]
xhigh | Rejected | 0.00 % | [-14.48 -1.95]
dat2a.brm2 |>
bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") |>
plot()Picking joint bandwidth of 0.48
Conclusions:
- there is insufficient evidence to conclude that there is a difference in \(y\) between “control” and “medium” groups
- we cannot conclude that there is evidence of no effect
14 Summary plots
dat1a.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1a.brm2 |>
emmeans(~x, at = dat1a.grid) |>
as.data.frame() |>
ggplot(aes(y = emmean, x = x)) +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) +
geom_point(data = dat, aes(y = y)) +
geom_line() +
theme_classic()As a spaghetti plot
dat1b.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1b.brm2 |>
emmeans(~x, at = dat1b.grid) |>
as.data.frame() |>
ggplot(aes(y = emmean, x = x)) +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) +
geom_point(data = dat, aes(y = y)) +
geom_line() +
theme_classic()As a spaghetti plot
dat1c.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1c.brm2 |>
emmeans(~x, at = dat1c.grid) |>
as.data.frame() |>
ggplot(aes(y = emmean, x = x)) +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) +
geom_point(data = dat, aes(y = y)) +
geom_line() +
theme_classic()As a spaghetti plot
The end