library(tidyverse) #for data wrangling and plotting
library(DHARMa) #for simulated residuals
library(performance) #for model diagnostics
library(see) #for model diagnostics
library(brms) #for Bayesian models
library(tidybayes) #for exploring Bayesian PKPDmodels
library(rstan) #for diagnostics plots
library(bayesplot) #for diagnostic plots
library(patchwork) #for arranging multiple plots
library(gridGraphics)#for arranging multiple plots - needed for some patchwork plots
library(HDInterval) #for HPD intervals
library(bayestestR) #for ROPE
library(emmeans) #for estimated marginal means
library(standist) #for plotting distributions
library(cmdstanr) #for the backend
source("helperfunctions.R")
Bayesian generalised linear models
1 Preparations
Load the necessary libraries
Many biologists and ecologists get a little twitchy and nervous around mathematical and statistical formulae and nomenclature. Whilst it is possible to perform basic statistics without too much regard for the actual equation (model) being employed, as the complexity of the analysis increases, the need to understand the underlying model becomes increasingly important. Moreover, model specification in BUGS (the language used to program Bayesian modelling) aligns very closely to the underlying formulae. Hence a good understanding of the underlying model is vital to be able to create a sensible Bayesian model. Consequently, I will always present the linear model formulae along with the analysis. If you start to feel some form of disorder starting to develop, you might like to run through the Tutorials and Workshops twice (the first time ignoring the formulae).
This tutorial will introduce the concept of Bayesian (generalised) linear models and demonstrate how to fit simple models to a set of simple fabricated data sets, each representing major data types encountered in ecological research. Subsequent tutorials will build on these fundamentals with increasingly more complex data and models.
2 A philosophical note
To introduce the philosophical and mathematical differences between classical (frequentist) and Bayesian statistics, Wade (2000) presented a provocative yet compelling trend analysis of two hypothetical populations. The temporal trend of one of the populations shows very little variability from a very subtle linear decline. By contrast, the second population appears to decline more dramatically, yet has substantially more variability.
Wade (2000) neatly illustrates the contrasting conclusions (particularly with respect to interpreting probability) that would be drawn by the frequentist and Bayesian approaches and in so doing highlights how and why the Bayesian approach provides outcomes that are more aligned with management requirements.
This tutorial will start by replicating the demonstration of Wade (2000).
n: 10
Slope: -0.1022
t: -2.3252
p-value: 0.0485
n: 10
Slope: -10.2318
t: -2.2115
p-value: 0.0579
n: 100
Slope: -10.4713
t: -6.6457
p-value: 0
From a traditional frequentist perspective, we would conclude that there is a ‘significant’ relationship in Population A and C (\(p<0.05\)), yet not in Population B (\(p>0.05\)). Note, Population B and C were both generated from the same random distribution, it is just that Population C has a substantially higher number of observations.
The above illustrates a couple of things
statistical significance does not necessarily translate into biological importance. The percentage of decline for Population A is 0.46 where as the percentage of decline for Population B is 45.26. That is Population B is declining at nearly 10 times the rate of Population A. That sounds rather important, yet on the basis of the hypothesis test, we would dismiss the decline in Population B.
that a p-value is just the probability of detecting an effect or relationship - what is the probability that the sample size is large enough to pick up a difference.
Let us now look at it from a Bayesian perspective. I will just provide the posterior distributions (densities scaled to 0-1 so that they can be plotted together) for the slope for each population.
Focusing on Populations A and B, we would conclude:
the mean (plus or minus CI) slopes for Population A and B are -0.1 (-0.21,0) and -10.14 (-20.12,0.37) respectively.
the Bayesian approach allows us to query the posterior distribution is many other ways in order to ask sensible biological questions. For example, we might consider that a rate of change of 5% or greater represents an important biological impact. For Population A and B, the probability that the rate is 5% or greater is 0 and 0.86 respectively.
3 Review of (generalised) linear models
I would highly recommend reviewing the information in the tutorial on generalised linear models, particularly the sections describe linear models, assumption checking and generalised linear models (GLM). Whilst there are philosophical differences between frequentist and Bayesian statistics that have implications for how models are fit and interpreted, model choice and assumption checking principles are common between the two approaches. Hence, many of these topics will be assumed, and not fully described in the current tutorial.
Recall from the tutorial on generalised linear models that simple linear regression is a linear modelling process that models a single response variable against one or more predictors with a linear combination of coefficients and (in the case of a Gaussian model) can be expressed as:
\[y_i = \beta_0+ \beta_1 x_i+\epsilon_i \hspace{1cm}\epsilon\sim{}N(0,\sigma^2)\]
where:
\(y_i\) is the response value for each of the \(i\) observations
\(\beta_0\) is the y-intercept (value of \(y\) when \(x=0\))
\(\beta_1\) is the slope (rate of chance in \(y\) per unit chance in \(x\))
\(x_i\) is the predictor value for each of the \(i\) observations
\(\epsilon_i\) is the residual value of each of the \(i\) observations. A residual is the difference between the observed value and the value expected by the model.
\(\epsilon\sim{}N(0,\sigma^2)\) indicates that the residuals are normally distributed with a constant amount of variance
The above can be re-expressed and generalised as:
\[ \begin{align} y_i&\sim{}Dist(\mu_i, ...) \\ g(\mu_i) &= \beta_0+ \beta_1 x_i \end{align} \]
where:
- \(Dist\) represents a distribution from the exponential family (such as Gaussian, Poisson, Binomial, etc)
- \(...\) represents additional parameters relevant to the nominated distribution (such as \(\sigma^2\): Gaussian, \(n\): Binomial and \(\phi\): Negative Binomial, etc)
- \(g()\) represents the link function (e.g. log: Poisson, logit: Binomial, etc)
The reliability of any model depends on the degree to which the data adheres to the model assumptions. Hence, as with frequentist models, exploratory data analysis (EDA) is a vital component of Bayesian modelling and since the model structures are similar between frequentist and Bayesian approaches, so too is EDA.
4 Bayesian (generalised) linear models
For the purpose of introduction, we will start by exploring a Gaussian model with a very simple fabricated data set representing the relationship between a response (\(y\)) and a continuous predictor (\(x = [1,2,3,4,5,6,7,8,9,10]\). The fabricated data set will comprise 10 observations each drawn from normal distributions with a set standard deviation of 4. The means of the 10 populations will be determined by the following equation:
\[ \mu_i = 2 + 5\times x_i \]
Let generate these data.
set.seed(234)
dat <- data.frame(x = 1:10) |>
mutate(y = round(rnorm(n = 10, mean = 2 + (5 * x), sd = 4), digits = 2))
dat
x y
1 1 9.64
2 2 3.79
3 3 11.00
4 4 27.88
5 5 32.84
6 6 32.56
7 7 37.84
8 8 29.86
9 9 45.05
10 10 47.65
The model will will be fitting will be:
\[ \begin{align} y_i&\sim{}N(\mu_i, \sigma^2)\\ \mu_i &= \beta_0+ \beta_1 x_i \end{align} \]
The parameters that we are going to attempt to estimate are the y-intercept (\(\beta_0\)), the slope (\(\beta_1\)) and the underlying variance (\(\sigma^2\)). Recall (from tutorials on statistical philosophies and estimation)) that Bayesian models calculate posterior predictions (\(P(H|D)\)) from likelihood (\(P(D|H)\)) and prior expectations (\(P(H)\)). Therefore, in preparation for fitting a Bayesian model, we must consider what our prior expectations are for all parameters.
The individual responses (\(y_i\), observed yields) are each expected to have been independently drawn from normal (Gaussian) distributions (\(\mathcal{N}\)). These distributions represent all the possible values of \(y\) we could have obtained at the specific (\(i^{th}\)) level of \(x\). Hence the \(i^{th}\) \(y\) observation is expected to have been drawn from a normal distribution with a mean of \(\mu_i\).
Although each distribution is expected to come from populations that differ in their means, we assume that all of these distributions have the same variance (\(\sigma^2\)).
4.1 Priors
We need to supply priors for each of the parameters to be estimated (\(\beta_0\), \(\beta_1\) and \(\sigma\)). Whilst we want these priors to be sufficiently vague as to not influence the outcomes of the analysis (and thus be equivalent to the frequentist analysis), we do not want the priors to be so vague (wide) that they permit the MCMC sampler to drift off into parameter space that is both illogical as well as numerically awkward.
Proffering sensible priors is one of the most difficult aspects of performing Bayesian analyses. For instances where there are some previous knowledge available and a desire to incorporate those data, the difficulty is in how to ensure that the information is incorporated correctly. However, for instances where there are no previous relevant information and so a desire to have the posteriors driven entirely by the new data, the difficulty is in how to define priors that are both vague enough (not bias results in their direction) and yet not so vague as to allow the MCMC sampler to drift off into unsupported regions (and thus get stuck and yield spurious estimates).
For early implementations of MCMC sampling routines (such as Metropolis Hasting and Gibbs), it was fairly common to see very vague priors being defined. For example, the priors on effects, were typically normal priors with mean of 0 and variance of 1e+06
(1,00,000). These are very vague priors. Yet for some samplers (e.g. NUTS), such vague priors can encourage poor behaviour of the sampler - particularly if the posterior is complex. It is now generally advised that priors should (where possible) be somewhat weakly informative and to some extent, represent the bounds of what are feasible and sensible estimates.
The degree to which priors influence an outcome (whether by having a pulling effect on the estimates or by encouraging the sampler to drift off into unsupported regions of the posterior) is dependent on:
- the relative sparsity of the data - the larger the data, the less weight the priors have and thus less influence they exert.
- the complexity of the model (and thus posterior) - the more parameters, the more sensitive the sampler is to the priors.
The sampled posterior is the product of both the likelihood and the prior - all of which are multidimensional. For most applications, it would be vertically impossible to define a sensible multidimensional prior. Hence, our only option is to define priors on individual parameters (e.g. the intercept, slope(s), variance etc) and to hope that if they are individually sensible, they will remain collectively sensible.
So having (hopefully) impressed upon the notion that priors are an important consideration, I will now attempt to synthesise some of the approaches that can be employed to arrive at weakly informative priors that have been gleaned from various sources. Largely, this advice has come from the following resources:
- https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations
- http://svmiller.com/blog/2021/02/thinking-about-your-priors-bayesian-analysis/
I will outline some of the current main recommendations before summarising some approaches in a table.
- weakly informative priors should contain enough information so as to regularise (discourage unreasonable parameter estimates whilst allowing all reasonable estimates).
- for effects parameters on scaled (standardised) data, an argument could be made for a normal distribution with a standard deviation of 1 (e.g.
normal(0,1)
), although some prefer a t distribution with 3 degrees of freedom and standard deviation of 1 (e.g.student_t(3,0,1)
) - apparently a flatter t is a more robust prior than a normal as an uninformative prior… - for un-scaled data, the above priors can be scaled by using the standard deviation of the data as the prior standard deviation (e.g.
student_t(3,0,sd(y))
, orsudent_t(3,0,sd(y)/sd(x))
) - for priors of hierachical standard deviations, priors should encourage shrinkage towards 0 (particularly if the number of groups is small, since otherwise, the sampler will tend to be more responsive to “noise”).
In this tutorial series, we will perform Bayesian analysis in the STAN language via an R interface. Two popular interfaces that greatly simplify the specification of Bayesian models are brms and rstanarm. We will exclusively focus on the former as it is far more flexible.
Family | Parameter | brms | rstanarm |
---|---|---|---|
Gaussian | Intercept | student_t(3,median(y),mad(y)) |
normal(mean(y),2.5*sd(y)) |
‘Population effects’ (slopes, betas) | flat, improper priors | normal(0,2.5*sd(y)/sd(x)) |
|
Sigma | student_t(3,0,mad(y)) |
exponential(1/sd(y)) |
|
‘Group-level effects’ | student_t(3,0,mad(y)) |
decov(1,1,1,1) |
|
Correlation on group-level effects | ljk_corr_cholesky(1) |
||
Poisson | Intercept | student_t(3,median(y),mad(y)) |
normal(mean(y),2.5*sd(y)) |
‘Population effects’ (slopes, betas) | flat, improper priors | normal(0,2.5*sd(y)/sd(x)) |
|
‘Group-level effects’ | student_t(3,0,mad(y)) |
decov(1,1,1,1) |
|
Correlation on group-level effects | ljk_corr_cholesky(1) |
||
Negative binomial | Intercept | student_t(3,median(y),mad(y)) |
normal(mean(y),2.5*sd(y)) |
‘Population effects’ (slopes, betas) | flat, improper priors | normal(0,2.5*sd(y)/sd(x)) |
|
Shape | gamma(0.01, 0.01) |
exponential(1/sd(y)) |
|
‘Group-level effects’ | student_t(3,0,mad(y)) |
decov(1,1,1,1) |
|
Correlation on group-level effects | ljk_corr_cholesky(1) |
Notes:
brms
https://github.com/paul-buerkner/brms/blob/c2b24475d727c8afd8bfc95947c18793b8ce2892/R/priors.R
- In the above, for non-Gaussian families,
y
is first transformed according to the family link. If the family link islog
, then 0.1 is first added to 0 values. - in
brms
the minimum standard deviation for the Intercept prior is2.5
- in
brms
the minimum standard deviation for group-level priors is10
.
rstanarm
http://mc-stan.org/rstanarm/articles/priors.html
- in
rstanarm
priors on standard deviation and correlation associated with group-level effects are packaged up into a single prior (decov
which is a decomposition of the variance and covariance matrix).
In my experience, I find that the above priors tend to be a little bit too wide for many ecological applications and I often prefer to use 1.5 rather than 2.5 as the multiplier.
In Bayesian models, centering of predictors offers huge numerical advantages. So important is it to center that brms
automatically centers any continuous predictors for you. However, since the user has not necessarily centered the predictors, the user might misinterpret the outputs from a brms
model. Consequently, when fitting a model, brms
also generates y-intercept values that are consistent with un-centered values and these are the estimates returned to the user.
Nevertheless, I would recommend that you should always explicitly center continuous predictors to provide more meaningful interpretations of the y-intercept. I would also highly recommend standardising continuous predictors - this will not only help speed up and stabilise the model, it will simplify the spefication of priors - see the specific examples later in this tutorial.
Based on the above, for our fabricated data, lets assign the following priors:
- \(\beta_0\): Normal prior centred at 31.21 with a variance of 15.17
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 4.09
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 15.17
variance:
Note, again, when fitting models through either rstanarm
or brms
, the priors assume that the predictor(s) have been centred and are to be applied on the link scale. In this case the link scale is an identity.
Similar logic can be applied for models that employ different distributions. In the following sections, we will define numerous sets of data (each of which represents different major forms of ecological data) and see how we can set appropriate priors in each case. In working through these example, it is worth reflecting on how much simpler prior specification is if we use standardised predictors.
5 Example data
This tutorial will blend theoretical discussions with actual calculations and model fits. I believe that by bridging the divide between theory and application, we all gain better understanding. The applied components of this tutorial will be motivated by numerous fabricated data sets. The advantage of simulated data over real data is that with simulated data, we know the ‘truth’ and can therefore gauge the accuracy of estimates.
The motivating examples are:
- Example 1 - simulated samples drawn from a Gaussian (normal) distribution reminiscent of data collected on measurements (such as body mass)
- Example 2 - simulated Gaussian samples drawn three different populations representing three different treatment levels (e.g. body masses of three different species)
- Example 3 - simulated samples drawn from a Poisson distribution reminiscent of count data (such as number of individuals of a species within quadrats)
- Example 4 - simulated samples drawn from a Negative Binomial distribution reminiscent of over-dispersed count data (such as number of individuals of a species that tends to aggregate in groups)
- Example 5 - simulated samples drawn from a Bernoulli (binomial with \(n = 1\)) distribution reminiscent of binary data (such as the presence/absence of a species within sites)
- Example 6 - simulated samples drawn from a Binomial distribution reminiscent of proportional data (such as counts of a particular taxa out of a total number of individuals)
Lets formally simulate the data illustrated above. The underlying process dictates that on average a one unit change in the predictor (x
) will be associated with a five unit change in response (y
) and when the predictor has a value of 0, the response will typically be 2. Hence, the response (y
) will be related to the predictor (x
) via the following:
\[ y = 2 + 5x \]
This is a deterministic model, it has no uncertainty. In order to simulate actual data, we need to add some random noise. We will assume that the residuals are drawn from a Gaussian distribution with a mean of zero and standard deviation of 4. The predictor will comprise of 10 uniformly distributed integer values between 1 and 10. We will round the response to two decimal places.
For repeatability, a seed will be employed on the random number generator. Note, the smaller the dataset, the less it is likely to represent the underlying deterministic equation, so we should keep this in mind when we look at how closely our estimated parameters approximate the ‘true’ values. Hence, the seed has been chosen to yield data that maintain a general trend that is consistent with the defining parameters.
set.seed(234)
dat <- data.frame(x = 1:10) |>
mutate(y = round(2 + 5*x + rnorm(n = 10, mean = 0, sd = 4), digits = 2))
dat
x y
1 1 9.64
2 2 3.79
3 3 11.00
4 4 27.88
5 5 32.84
6 6 32.56
7 7 37.84
8 8 29.86
9 9 45.05
10 10 47.65
We will use these data in two ways. Firstly, to estimate the mean and variance of the reponse (y
) ignoring the predcitor (x
) and secondly to estimate the relationship between the reponse and predictor.
For the former, we know that the mean and variance of the response (y
) can be calculated as:
\[ \begin{align} \bar{y} =& \frac{1}{n}\sum^n_{i=1}y_i\\ var(y) =& \frac{1}{n}\sum^n_{i=1}(y-\bar{y})^2\\ sd(y) =& \sqrt{var(y)} \end{align} \]
As previously described, categorical predictors are transformed into dummy codes prior to the fitting of the linear model. We will simulate a small data set with a single categorical predictor comprising a control and two treatment levels (‘mediam’, ‘high’). To simplify things we will assume a Gaussian distribution, however most of the modelling steps would be the same regardless of the chosen distribution.
The data will be drawn from three Gaussian distributions with a standard deviation of 4 and means of 20, 15 and 10. We will draw a total of 12 observations, four from each of the three populations.
set.seed(123)
beta_0 <- 20
beta <- c(-2, -10)
sigma <- 4
n <- 12
x <- gl(3, 4, 12, labels = c('control', 'medium', 'high'))
y <- (model.matrix(~x) %*% c(beta_0, beta)) + rnorm(12, 0, sigma)
dat2 <- data.frame(x = x, y = y)
dat2
x y
1 control 17.758097
2 control 19.079290
3 control 26.234833
4 control 20.282034
5 medium 18.517151
6 medium 24.860260
7 medium 19.843665
8 medium 12.939755
9 high 7.252589
10 high 8.217352
11 high 14.896327
12 high 11.439255
The Poisson distribution is only parameterized by a single parameter (\(\lambda\)) which represents both the mean and variance. Furthermore, Poisson data can only be positive integers.
Unlike simple trend between two Gaussian or uniform distributions, modelling against a Poisson distribution alters the scale to logarithms. This needs to be taken into account when we simulate the data. The parameters that we used to simulate the underlying processes need to either be on a logarithmic scale, or else converted to a logarithmic scale prior to using them for generating the random data.
Moreover, for any model that involves a non-identity link function (such as a logarithmic link function for Poisson models), ‘slope’ is only constant on the scale of the link function. When it is back transformed onto the natural scale (scale of the data), it takes on a different meaning and interpretation.
We will chose \(\beta_0\) to represent a value of 1 when x=0
. As for the ‘effect’ of the predictor on the response, lets say that for every one unit increase in the predictor the response increases by 40% (on the natural scale). Hence, on the log scale, the slope will be \(log(1.5)=\) 0.3364722.
In theory, count data should follow a Poisson distribution and therefore have properties like mean equal to variance (e.g. \(\textnormal{Dispersion}=\frac{\sigma}{\mu}=1\)). However as simple linear models are low dimensional representations of a system, it is often unlikely that such a simple model can capture all the variability in the response (counts). For example, if we were modelling the abundance of a species of intertidal snail within quadrats in relation to water depth, it is highly likely that water depth alone drives snail abundance. There are countless other influences that the model has not accounted for. As a result, the observed data might be more variable than a Poisson (of a particular mean) would expect and in such cases, the model is over-dispersed (more variance than expected).
Over dispersed models under-estimate the variability and thus precision in estimates resulting in inflated confidence in outcomes (elevated Type I errors).
There are numerous causes of over-dispersed count data (one of which is eluded to above). These are:
- additional sources of variability not being accounted for in the model (see above)
- when the items being counted aggregate together. Although the underlying items may have been generated by a Poisson process, the items clump together. When the items are counted, they are more likely to be in either in relatively low or relatively high numbers - hence the data are more varied than would be expected from their overall mean.
- imperfect detection resulting in excessive zeros. Again the underlying items may have been generated by a Poisson process, however detecting and counting the items might not be completely straight forward (particularly for more cryptic items). Hence, the researcher may have recorded no individuals in a quadrat and yet there was one or more present, they were just not obvious and were not detected. That is, layered over the Poisson process is another process that determines the detectability. So while the Poisson might expect a certain proportion of zeros, the observed data might have a substantially higher proportion of zeros - and thus higher variance.
This example will generate data that is drawn from a negative binomial distribution so as to broadly represent any one of the above causes.
We will chose \(\beta_0\) to represent a value of 1 when x=0
. As for the ‘effect’ of the predictor on the response, lets say that for every one unit increase in the predictor the response increases by 40% (on the natural scale). Hence, on the log scale, the slope will be \(log(1.5)=\) 0.3364722. Finally, the dispersion parameter (ratio of variance to mean) will be 10.
set.seed(234)
beta <- c(1, 1.40)
beta <- log(beta)
n <- 10
size <- 10
dat4 <- data.frame(x = seq(from = 1, to = 10, len = n)) |>
mutate(
mu = exp(beta[1] + beta[2] * x),
y = rnbinom(n, size = size, mu = mu)
)
dat4
x mu y
1 1 1.400000 0
2 2 1.960000 3
3 3 2.744000 7
4 4 3.841600 3
5 5 5.378240 5
6 6 7.529536 9
7 7 10.541350 13
8 8 14.757891 10
9 9 20.661047 17
10 10 28.925465 26
Binary data (presence/absence, dead/alive, yes/no, heads/tails, etc) pose unique challenges for linear modeling. Linear regression, designed for continuous outcomes, may not be directly applicable to binary responses. The nature of binary data violates assumptions of normality and homoscedasticity, which are fundamental to linear regression. Furthermore, linear models may predict probabilities outside the [0, 1] range, leading to unrealistic predictions.
This example will generate data that is drawn from a bernoulli distribution so as to broadly represent presence/absence data.
We will chose \(\beta_0\) to represent the odds of a value of 1 when \(x=0\) equal to \(0.02\). This is equivalent to a probability of \(y\) being zero when \(x=0\) of \(\frac{0.02}{1+0.02}=0.0196\). E.g., at low \(x\), the response is likely to be close to 0. For every one unit increase in \(x\), we will stipulate a 2 times increase in odds that the expected response is equal to 1.
Similar to binary data, proportional (binomial) data tend to violate normality and homogeneity of variance (particularly as mean proportions approach either 0% or 100%.
This example will generate data that is drawn from a binomial distribution so as to broadly represent proportion data.
We will chose \(\beta_0\) to represent the odds of a particular trial (e.g. an individual) being of a particular type (e.g. species 1) when \(x=0\) and to equal to \(0.02\). This is equivalent to a probability of \(y\) being of the focal type when \(x=0\) of \(\frac{0.02}{1+0.02}=0.0196\). E.g., at low \(x\), the the probability that an individual is taxa 1 is likely to be close to 0. For every one unit increase in \(x\), we will stipulate a 2.5 times increase in odds that the expected response is equal to 1.
For this example, we will also convert the counts into proportions (\(y2\)) by division with the number of trials (\(5\)).
set.seed(123)
beta <- c(0.02, 2.5)
beta <- log(beta)
n <- 10
trials <- 5
dat6 <- data.frame(x = seq(from = 1, to = 10, len = n)) |>
mutate(
count = as.numeric(rbinom(n, size = trials, prob = plogis(beta[1] + beta[2] * x))),
total = trials,
y = count/total
)
dat6
x count total y
1 1 0 5 0.0
2 2 1 5 0.2
3 3 1 5 0.2
4 4 4 5 0.8
5 5 2 5 0.4
6 6 5 5 1.0
7 7 5 5 1.0
8 8 4 5 0.8
9 9 5 5 1.0
10 10 5 5 1.0
6 Exploratory data analysis
Statistical models utilize data and the inherent statistical properties of distributions to discern patterns, relationships, and trends, enabling the extraction of meaningful insights, predictions, or inferences about the phenomena under investigation. To do so, statistical models make assumptions about the likely distributions from which the data were collected. Consequently, the reliability and validity of any statistical model depend upon adherence to these underlying assumptions.
Exploratory Data Analysis (EDA) and assumption checking therefore play pivotal roles in the process of statistical analysis, offering essential tools to glean insights, assess the reliability of statistical methods, and ensure the validity of conclusions drawn from data. EDA involves visually and statistically examining datasets to understand their underlying patterns, distributions, and potential outliers. These initial steps provides an intuitive understanding of the data’s structure and guides subsequent analyses. By scrutinizing assumptions, such as normality, homoscedasticity, and independence, researchers can identify potential limitations or violations that may impact the accuracy and reliability of their findings.
Exploratory Data Analysis within the context of ecological statistical models usually comprise a set of targetted graphical summaries. They are not to be considered definitive diagnostics of the model assumptions, but rather a first pass to assess the obvious violations prior to the fitting of models. More definitive diagnostics can only be achieved after a model has been fit.
In addition to graphical summaries, there are numerous statistical tests to help explore possible violations of various statistical assumptions. These tests are less commonly used in ecology since they are often more sensitive to deviations from ideal than are the models that we are seeking to ensure.
Simple classic regression models are often the easiest models to fit and interpret and as such often represent a standard by which other alternate models are gauged. As you will see later in this tutorial, such models can actually be fit using closed form (exact solution) matrix algebra that can be performed by hand. Nevertheless, and perhaps as a result, they also impose some of the strictest assumptions. Although these collective assumptions are specific to gaussian models, they do provide a good introduction to model assumptions in general, so we will use them to motivate the more wider discussion.
Simple (gaussian) linear models (represented below) make the following assumptions:
The data depicted above where generated using the following R codes:
The observations represent
- single observations drawn from 10 normal populations
- each population had a standard deviation of 4
- the mean of each population varied linearly according to the value of x (\(2 + 5x\))
- normality: the residuals (and thus observations) must be drawn from populations that are normal distribution. The right hand figure underlays the ficticious normally distributed populations from which the observed values have been sampled.
Estimation and inference testing in linear regression assumes that the response is normally distributed in each of the populations. In this case, the populations are all possible measurements that could be collected at each level of \(x\) - hence there are 16 populations. Typically however, we only collect a single observation from each population (as is also the case here). How then can be evaluate whether each of these populations are likely to have been normal?
For a given response, the population distributions should follow much the same distribution shapes. Therefore provided the single samples from each population are unbiased representations of those populations, a boxplot of all observations should reflect the population distributions.
The two figures above show the relationships between the individual population distributions and the overall distribution. The left hand figure shows a distribution drawn from single representatives of each of the 16 populations. Since the 16 individual populations were normally distributed, the distribution of the 16 observations is also normal.
By contrast, the right hand figure shows 16 log-normally distributed populations and the resulting distribution of 16 single observations drawn from these populations. The overall boxplot mirrors each of the individual population distributions.
Whilst traditionally, non-normal data would typically be normalised via a scale transformation (such as a logarithmic transformation), these days it is arguably more appropriate to attempt to match the data to a more suitable distribution (see later in this tutorial).
You may have noticed that we have only explored the distribution of the response (y-axis). What about the distribution of the predictor (independent, x-axis) variable, does it matter? The distribution assumption applies to the residuals (which as purely in the direction of the y-axis). Indeed technically, it is assumed that there is no uncertainty associated with the predictor variable. They are assumed to be set and thus there is no error associated with the values observed. Whilst this might not always be reasonable, it is an assumption.
Given that the predictor values are expected to be set rather than measured, we actually assume that they are uniformly distributed. In practice, the exact distribution of predictor values is not that important provided it is reasonably symmetrical and no outliers (unusually small or large values) are created as a result of the distribution.
As with exploring the distribution of the response variable, boxplots, histograms and density plots can be useful means of exploring the distribution of predictor variable(s). When such diagnostics reveal distributional issues, scale transformations (such as logarithmic transformations) are appropriate.
homogeneity of variance: the residuals (and thus observations) must be drawn from populations that are equally varied. The model as shown only estimates a single variance (\(\sigma^2\)) parameter - it is assumed that this is a good overall representation of all underlying populations. The right hand figure underlays the ficticious normally distributed and equally varied populations from which the observations have been sampled.
Moreover, since the expected values (obtained by solving the deterministic component of the model) and the variance must be estimated from the same data, they need to be independent (not related one another)
Simple linear regression also assumes that each of the populations are equally varied. Actually, it is the prospect of a relationship between the mean and variance of y-values across x-values that is of the greatest concern. Strictly the assumption is that the distribution of y values at each x value are equally varied and that there is no relationship between mean and variance.
However, as we only have a single y-value for each x-value, it is difficult to directly determine whether the assumption of homogeneity of variance is likely to have been violated (mean of one value is meaningless and variability can’t be assessed from a single value). The figure below depicts the ideal (and almost never realistic) situation in which (left hand figure) the populations are all equally varied. The middle figure simulates drawing a single observation from each of the populations. When the populations are equally varied, the spread of observed values around the trend line is fairly even - that is, there is no trend in the spread of values along the line.
If we then plot the residuals (difference between observed values and those predicted by the trendline) against the predict values, there is a definite lack of pattern. This lack of pattern is indicative of a lack of issues with homogeneity of variance.
If we now contrast the above to a situation where the population variance is related to the mean (unequal variance), we see that the observations drawn from these populations are not evenly distributed along the trendline (they get more spread out as the mean predicted value increase). This pattern is emphasized in the residual plot which displays a characteristic “wedge”-shape pattern.
Hence looking at the spread of values around a trendline on a scatterplot of \(y\) against \(x\) is a useful way of identifying gross violations of homogeneity of variance. Residual plots provide an even better diagnostic. The presence of a wedge shape is indicative that the population mean and variance are related.
- linearity: the underlying relationships must be simple linear trends, since the line of best fit through the data (of which the slope is estimated) is linear. The right hand figure depicts a linear trend through the underlying populations.
It is important to disclose the meaning of the word “linear” in the term “linear regression”. Technically, it refers to a linear combination of regression coefficients. For example, the following are examples of linear models:
- \(y_i = \beta_0 + \beta_1 x_i\)
- \(y_i = \beta_0 + \beta_1 x_i + \beta_2 z_i\)
- \(y_i = \beta_0 + \beta_1 x_i + \beta_2 x^2_i\)
All the coefficients (\(\beta_0\), \(\beta_1\), \(\beta_2\)) are linear terms. Note that the last of the above examples, is a linear model, however it describes a non-linear trend. Contrast the above models with the following non-linear model:
- \(y_i = \beta_0 + x_i^{\beta_1}\)
In that case, the coefficients are not linear combinations (one of them is a power term).
That said, a simple linear regression usually fits a straight (linear) line through the data. Therefore, prior to fitting such a model, it is necessary to establish whether this really is the most sensible way of describing the relationship. That is, does the relationship appear to be linearly related or could some other non-linear function describe the relationship better. Scatterplots and residual plots are useful diagnostics.
To see how a residual plot could be useful, consider the following. The first row of figures illustrate the residuals resulting from data drawn from a linear trend. The residuals are effectively random noise. By contrast, the second row show the residuals resulting from data drawn from a non-normal relationship that have nevertheless been modelled as a linear trend. There is still a clear pattern remaining in the residuals.
The above might be an obvious and somewhat overly contrived example, yet it does illustrate the point - that a pattern in the residuals could point to a mis-specified model.
If non-linearity does exist (as in the second case above) , then fitting a straight line through what is obviously not a straight relationship is likely to poorly represent the true nature of the relationship. There are numerous causes of non-linearity:
- underlying distributional issues can result in non-linearity. For example, if we are assuming a gaussian distribution and the data are non-normal, often the relationships will appear non-linear. Addressing the distributional issues can therefore resolve the linearity issues
- the underlying relationship might truly be non-linear in which case this should be reflected in some way by the model formula. If the model formula fails to describe the non-linear trend, then problems will persist.
- the model proposed is missing an important covariate that might help standardise the data in a way that results in linearity
independence: the residuals (and thus observations) must be independently drawn from the populations. That is, the correlation between all the observations is assumed to be 0 (off-diagonals in the covariance matrix). More practically, there should be no pattern to the correlations between observations.
Random sampling and random treatment assignment are experimental design elements that are intended to mitigate many types of sampling biases that cause dependencies between observations. Nevertheless, there are aspects of sampling designs that are either logistically difficult to randomise or in some cases not logically possible. For example, the residuals from observations sampled closer together in space and time will likely be more similar to one another than those of observations more spaced apart. Since neither space nor time can be randomised, data collected from sampling designs that involve sampling over space and/or time need to be assess for spatial and temporal dependencies. These concepts will be explored in the context of introducing susceptible designs in a later tutorial.
The above is only a very brief overview of the model assumptions that apply to just one specific model (simple linear gaussian regression). For the remainder of this section, we will graphically explore the two motivating example data sets so as gain insights into what distributional assumptions might be most valid, and thus help guide modelling choices. Similarly, for subsequent tutorials in this series (that introduce progressively more complex models), all associated assumptions will be explored and detailed.
Conclusions
- there are no obvious violations of the linear regression model assumptions
- we can now fit the suggested model
- full confirmation about the model’s goodness of fit should be reserved until after exploring the additional diagnostics that are only available after fitting the model.
Conclusions
- the spread of noise in each group seems reasonably similar
- more importantly, there does not seem to be a relationship between the mean (as approximated by the position of the boxplots along the y-axis) and the variance (as approximated by the spread of the boxplots).
- that is, the size of the boxplots do not vary with the elevation of the boxplots.
Linearity is not an issue for categorical predictors since it is effectively fitting separate lines between pairs of points (and a line between two points can only ever be linear)….
Conclusions
- no evidence of non-normality
- no evidence of non-homogeneity of variance
Conclusions
- the spread of noise does not look random along the line of best fit.
- homogeneity of variance is difficult to assess in the presence of distributional issues (such as non-normality in this case) as they can result in non-linearity (apparent here)
Conclusions
- the data do not appear to be linear
- the red line is a loess smoother and it is clear that the data are not linear
Conclusions
- there are obvious violations of the linear regression model assumptions
- we should consider a different model that does not assume normality
Conclusions
- the spread of noise does not look random along the line of best fit.
- homogeneity of variance is difficult to assess in the presence of distributional issues (such as non-normality in this case) as they can result in non-linearity (apparent here)
Conclusions
- the data do not appear to be linear
- the red line is a loess smoother and it is clear that the data are not linear
Conclusions
- there are obvious violations of the linear regression model assumptions
- we should consider a different model that does not assume normality
Conclusions
- the data are clearly not linear
- the red line is a loess smoother and it is clear that the data are not linear
Conclusions
- there are obvious violations of the linear regression model assumptions
- we should consider a different model that does not assume normality
Conclusions
- although there is no evidence of non-linearity from this small data set, it is worth noting that the line of best fit does extend outside the logical response range [0.1] within the range of observed \(x\) values. That is, a simple linear model would predict proportions higher than 100% at high values of \(x\)
- this is a common issue with binomial data and is often addressed by fitting a logistic regression model
Conclusions
- there are obvious violations of the linear regression model assumptions
- we should consider a different model that does not assume normality
7 Fitting models
One way to assess the priors is to have the MCMC sampler sample purely from the prior predictive distribution without conditioning on the observed data. Doing so provides a glimpse at the range of predictions possible under the priors. On the one hand, wide ranging predictions would ensure that the priors are unlikely to influence the actual predictions once they are conditioned on the data. On the other hand, if they are too wide, the sampler is being permitted to traverse into regions of parameter space that are not logically possible in the context of the actual underlying ecological context. Not only could this mean that illogical parameter estimates are possible, when the sampler is traversing regions of parameter space that are not supported by the actual data, the sampler can become unstable and have difficulty.
In brms
, we can inform the sampler to draw from the prior predictive distribution instead of conditioning on the response, by running the model with the sample_prior = 'only'
argument. Unfortunately, this cannot be applied when there are flat priors (since the posteriors will necessarily extend to negative and positive infinity). Therefore, in order to use this useful routine, we need to make sure that we have defined a proper prior for all parameters.
Earlier we suggested the following priors might be useful:
- \(\beta_0\): Normal prior centred at 31.21 with a variance of 15.17
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 4.09
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 15.17
variance:
It might be use usefull to understand what some of these distributions look like. For example, we have used both a normal (Gaussian) distribution and a flatter t distribution for y-intercept and slope respectively. This was a somewhat arbitrary choice. We could easily have gone with either normal or t distributions for all of the above parameters. To visualise prior distributions for the slope based on both normal and t distributions:
Evidently, the t distribution (with 3 degrees of freedom) is wider than the normal distribution. The former should be more robust to data with values that are less concentrated around the mean.
\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 x_i\\ \end{align} \]
- start by fitting the model and sampling from the priors only
Compiling Stan program...
Start sampling
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 7e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.013 seconds (Warm-up)
Chain 1: 0.013 seconds (Sampling)
Chain 1: 0.026 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 3e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.013 seconds (Warm-up)
Chain 2: 0.013 seconds (Sampling)
Chain 2: 0.026 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 3e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.012 seconds (Warm-up)
Chain 3: 0.013 seconds (Sampling)
Chain 3: 0.025 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 3e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.013 seconds (Warm-up)
Chain 4: 0.013 seconds (Sampling)
Chain 4: 0.026 seconds (Total)
Chain 4:
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ x
Data: dat (Number of observations: 10)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.27 0.37 -0.45 1.01 1.00 2264 1483
x -0.10 0.44 -0.98 0.82 1.00 2351 1997
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.09 0.33 0.66 1.92 1.00 2182 2260
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_x" "sigma" "Intercept"
[5] "prior_Intercept" "prior_b" "prior_sigma" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_x" "sigma" "Intercept"
[5] "prior_Intercept" "prior_b" "prior_sigma" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
Unless you explicitly direct brm
to include a user-defined intercept, the priors on the default intercept should assume that the predictor(s) are centered (because brm
will automatically center all continuous predictors).
Lets try the following priors:
- \(\beta_0\): Normal prior centred at 0.61 with a variance of 0.48
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centered at 0 with a variance of 0.75
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 0.48
variance:
\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 (x_i - \bar{x})\\ \end{align} \]
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "sigma"
[4] "Intercept" "prior_Intercept" "prior_b"
[7] "prior_sigma" "lprior" "lp__"
[10] "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "sigma"
[4] "Intercept" "prior_Intercept" "prior_b"
[7] "prior_sigma" "lprior" "lp__"
[10] "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
When the predictor is standardised, it simplifies prior definition because we no longer need to consider the scale of the predictor.
Lets try the following priors:
- \(\beta_0\): Normal prior centred at 0.61 with a variance of 0.48
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centered at 0 with a variance of 0.48
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 0.48
variance:
\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 (x_i - \bar{x})/\sigma_{x}\\ \end{align} \]
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalex" "sigma" "Intercept"
[5] "prior_Intercept" "prior_b" "prior_sigma" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalex" "sigma" "Intercept"
[5] "prior_Intercept" "prior_b" "prior_sigma" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
Lets try the following priors:
- \(\beta_0\): Normal prior centred at 19.68 with a variance of 1.87
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centered at 0 with a variance of 9.35
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 4.7
variance:
\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \sum{\beta_j x_{ij}}\\ \end{align} \]
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_xmedium" "b_xhigh" "sigma"
[5] "Intercept" "prior_Intercept" "prior_b" "prior_sigma"
[9] "lprior" "lp__" "accept_stat__" "stepsize__"
[13] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_xmedium" "b_xhigh" "sigma"
[5] "Intercept" "prior_Intercept" "prior_b" "prior_sigma"
[9] "lprior" "lp__" "accept_stat__" "stepsize__"
[13] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
Lets try the following priors:
- \(\beta\): t distribution (3 degrees of freedom) prior centered at 18.14 with a variance of 6.26
mean: since each groups mean is being estimated separately, they could either all have different priors, or more commonly, the same priors.
variance:
- \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 4.7
variance:
\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \sum{\beta_j x_{ij}}\\ \end{align} \]
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_xcontrol" "b_xmedium" "b_xhigh" "sigma"
[5] "prior_b" "prior_sigma" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_xcontrol" "b_xmedium" "b_xhigh" "sigma"
[5] "prior_b" "prior_sigma" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Pois(\lambda_i)\\ log(\lambda_i) =& \beta_0 + \beta_1 x_i\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 1.99 with a variance of 1.33
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 2.05
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_x" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_x" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Pois(\mu_i)\\ log(\mu_i) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 1.99 with a variance of 1.33
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 2.05
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "Intercept"
[4] "prior_Intercept" "prior_b" "lprior"
[7] "lp__" "accept_stat__" "stepsize__"
[10] "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "Intercept"
[4] "prior_Intercept" "prior_b" "lprior"
[7] "lp__" "accept_stat__" "stepsize__"
[10] "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Pois(\lambda_i)\\ log(\lambda_i) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 1.99 with a variance of 1.33
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1.33
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalex" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalex" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& NB(\lambda_i, \phi)\\ log(\lambda_i) =& \beta_0 + \beta_1 x_i\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Negative Binomial models, the link scale is log. So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 2.07 with a variance of 0.93
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1.43
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
dat4a.form <- bf(y ~ x, family = negbinomial(link = "log"))
dat4a.brm <- brm(dat4a.form,
data=dat4,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)
Compiling Stan program...
Start sampling
Warning: There were 1517 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
- explore the range of posterior predictions resulting from the priors alone
Warning in scale_y_log10(): log-10 transformation introduced infinite values.
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_x" "shape" "Intercept"
[5] "prior_Intercept" "prior_b" "prior_shape" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_x" "shape" "Intercept"
[5] "prior_Intercept" "prior_b" "prior_shape" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& NB(\mu_i, \phi)\\ log(\mu_i) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 2.07 with a variance of 0.93
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1.43
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
dat4b.form <- bf(y ~ scale(x, scale = FALSE), family = negbinomial(link = "log"))
dat4b.brm <- brm(dat4b.form,
data=dat4,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)
Compiling Stan program...
Start sampling
Warning: There were 1641 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
- explore the range of posterior predictions resulting from the priors alone
Warning in scale_y_log10(): log-10 transformation introduced infinite values.
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "shape"
[4] "Intercept" "prior_Intercept" "prior_b"
[7] "prior_shape" "lprior" "lp__"
[10] "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "shape"
[4] "Intercept" "prior_Intercept" "prior_b"
[7] "prior_shape" "lprior" "lp__"
[10] "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& NB(\lambda_i, \phi)\\ log(\lambda_i) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 2.07 with a variance of 0.93
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.93
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
dat4c.form <- bf(y ~ scale(x), family = negbinomial(link = "log"))
dat4c.brm <- brm(dat4c.form,
data=dat4,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)
Compiling Stan program...
Start sampling
Warning: There were 1537 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
- explore the range of posterior predictions resulting from the priors alone
Warning in scale_y_log10(): log-10 transformation introduced infinite values.
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalex" "shape" "Intercept"
[5] "prior_Intercept" "prior_b" "prior_shape" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalex" "shape" "Intercept"
[5] "prior_Intercept" "prior_b" "prior_shape" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + \beta_1 x_i\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Binary models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:
- the observed response values are only ever either 0 or 1
- a linear model is exploring whether the probability of a 1 changes from high to low or low to high according to the linear predictor
- the switch in probability is likely to be somewhere near the middle of the \(x\) range
- with a centered predictor, the mean response is expected to be approximately 0.5
- on a logit (log odds) scale, this corresponds to a value of 0.
- on a logit (log odds) scale, values of -3 and 3 are considered very wide
- on a logit scale, values between -1 and 1 are reasonable.
So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 0 with a variance of 1
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1
- start by fitting the model and sampling from the priors only
dat5a.form <- bf(y | trials(1) ~ x, family = binomial(link = "logit"))
dat5a.brm <- brm(dat5a.form,
data=dat5,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
- explore the range of posterior predictions resulting from the priors alone
For Binary data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_x" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_x" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1 -\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]
- \(\beta_0\): Normal prior centred at 0 with a variance of 1
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1
- start by fitting the model and sampling from the priors only
dat5b.form <- bf(y | trials(1) ~ scale(x, scale = FALSE), family = binomial(link = "logit"))
dat5b.brm <- brm(dat5b.form,
data=dat5,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "Intercept"
[4] "prior_Intercept" "prior_b" "lprior"
[7] "lp__" "accept_stat__" "stepsize__"
[10] "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "Intercept"
[4] "prior_Intercept" "prior_b" "lprior"
[7] "lp__" "accept_stat__" "stepsize__"
[10] "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1 -\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} \]
- \(\beta_0\): Normal prior centred at 0 with a variance of 1
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1
- start by fitting the model and sampling from the priors only
dat5c.form <- bf(y | trials(1) ~ scale(x), family = binomial(link = "logit"))
dat5c.brm <- brm(dat5c.form,
data=dat5,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
- explore the range of posterior predictions resulting from the priors alone
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalex" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalex" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + \beta_1 x_i\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:
- the expected \(\pi\) values are only ever between 0 or 1
- on a logit (log odds) scale, values of -3 and 3 are considered very wide
- on a logit scale, values between -1 and 1 are reasonable.
So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 0 with a variance of 0
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.51
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
For Binary data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.
Setting all 'trials' variables to 1 by default if not specified otherwise.
dat6a.brm |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
dat6a.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_x" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_x" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:
- the expected \(\pi\) values are only ever between 0 or 1
- on a logit (log odds) scale, values of -3 and 3 are considered very wide
- on a logit scale, values between -1 and 1 are reasonable.
So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 0 with a variance of 0
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.51
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
dat6b.form <- bf(count | trials(total) ~ scale(x, scale = FALSE),
family = binomial(link = "logit"))
dat6b.brm <- brm(dat6b.form,
data=dat6,
prior=priors,
sample_prior = 'only',
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
refresh = 0)
Compiling Stan program...
Start sampling
- explore the range of posterior predictions resulting from the priors alone
For Binomial data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.
Setting all 'trials' variables to 1 by default if not specified otherwise.
dat6b.brm |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
dat6b.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "Intercept"
[4] "prior_Intercept" "prior_b" "lprior"
[7] "lp__" "accept_stat__" "stepsize__"
[10] "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "Intercept"
[4] "prior_Intercept" "prior_b" "lprior"
[7] "lp__" "accept_stat__" "stepsize__"
[10] "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
\[ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\sigma\\ \end{align} \]
When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:
- the expected \(\pi\) values are only ever between 0 or 1
- on a logit (log odds) scale, values of -3 and 3 are considered very wide
- on a logit scale, values between -1 and 1 are reasonable.
So the following priors might be appropriate:
- \(\beta_0\): Normal prior centred at 0 with a variance of 0
mean:
variance:
- \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.33
mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
variance:
- start by fitting the model and sampling from the priors only
- explore the range of posterior predictions resulting from the priors alone
For Binomial data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.
Setting all 'trials' variables to 1 by default if not specified otherwise.
dat6c.brm |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)
Conclusions:
- the grey ribbon above represents the credible range of the posterior predictions
- this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
- the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
- now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
- re-explore the range of posterior predictions resulting from the fitted model
dat6c.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)
Conclusions:
- the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
- this suggests that the patterns are being driven predominantly by the data
- compare the priors and posteriors to further confirm that the priors are not overly influential
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
[1] "b_Intercept" "b_scalex" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
[1] "b_Intercept" "b_scalex" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
Conclusions:
- each of the priors are substantially wider than the posteriors
- the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
- the priors are simply regularising the parameters such that they are only sampled from plausible regions
8 MCMC sampling diagnostics
MCMC sampling behaviour
Since the purpose of the MCMC sampling is to estimate the posterior of an unknown joint likelihood, it is important that we explore a range of diagnostics designed to help identify when the resulting likelihood might not be accurate.
- traceplots - plots of the individual draws in sequence. Traces that resemble noise suggest that all likelihood features are likely to have be traversed. Obvious steps or blocks of noise are likely to represent distinct features and could imply that there are yet other features that have not yet been traversed - necessitating additional iterations. Furthermore, each chain should be indistinguishable from the others
- autocorrelation function - plots of the degree of correlation between pairs of draws for a range of lags (distance along the chains). High levels of correlation (after a lag of 0, which is correlating each draw with itself) suggests a lack of independence between the draws and that therefore, summaries such as mean and median will be biased estimates. Ideally, all non-zero lag correlations should be less than 0.2. The left hand figure below demonstrates a clear pattern of autocorrelation, whereas the right hand figure shows no autocorrelation.
- convergence diagnostics - there are a range of diagnostics aimed at exploring whether the multiple chains are likely to have converged upon similar posteriors
- R hat - this metric compares between and within chain model parameter estimates, with the expectation that if the chains have converged, the between and within rank normalised estimates should be very similar (and Rhat should be close to 1). The more one chains deviates from the others, the higher the Rhat value. Values less than 1.05 are considered evidence of convergence.
- Bulk ESS - this is a measure of the effective sample size from the whole (bulk) of the posterior and is a good measure of the sampling efficiency of draws across the entire posterior
- Tail ESS - this is a measure of the effective sample size from the 5% and 95% quantiles (tails) of the posterior and is a good measure of the sampling efficiency of draws from the tail (areas of the posterior with least support and where samplers can get stuck).
There are numerous packages in R that support MCMC diagnostics. Popular packages include:
bayesplot
rstan
ggmcmcm
Some of the most useful diagnostics are presented in the following table.
Package | Description | function | rstanarm | brms |
---|---|---|---|---|
bayesplot | Traceplot | mcmc_trace |
plot(mod, plotfun='trace') |
mcmc_plot(mod, type='trace') |
Density plot | mcmc_dens |
plot(mod, plotfun='dens') |
mcmc_plot(mod, type='dens') |
|
Density & Trace | mcmc_combo |
plot(mod, plotfun='combo') |
mcmc_plot(mod, type='combo') |
|
ACF | mcmc_acf_bar |
plot(mod, plotfun='acf_bar') |
mcmc_plot(mod, type='acf_bar') |
|
Rhat hist | mcmc_rhat_hist |
plot(mod, plotfun='rhat_hist') |
mcmc_plot(mod, type='rhat_hist') |
|
No. Effective | mcmc_neff_hist |
plot(mod, plotfun='neff_hist') |
mcmc_plot(mod, type='neff_hist') |
|
rstan | Traceplot | stan_trace |
stan_trace(mod) |
stan_trace(mod) |
ACF | stan_ac |
stan_ac(mod) |
stan_ac(mod) |
|
Rhat | stan_rhat |
stan_rhat(mod) |
stan_rhat(mod) |
|
No. Effective | stan_ess |
stan_ess(mod) |
stan_ess(mod) |
|
Density plot | stan_dens |
stan_dens(mod) |
stan_dens(mod) |
|
ggmcmc | Traceplot | ggs_traceplot |
ggs_traceplot(ggs(mod)) |
ggs_traceplot(ggs(mod)) |
ACF | ggs_autocorrelation |
ggs_autocorrelation(ggs(mod)) |
ggs_autocorrelation(ggs(mod)) |
|
Rhat | ggs_Rhat |
ggs_Rhat(ggs(mod)) |
ggs_Rhat(ggs(mod)) |
|
No. Effective | ggs_effective |
ggs_effective(ggs(mod)) |
ggs_effective(ggs(mod)) |
|
Cross correlation | ggs_crosscorrelation |
ggs_crosscorrelation(ggs(mod)) |
ggs_crosscorrelation(ggs(mod)) |
|
Scale reduction | ggs_grb |
ggs_grb(ggs(mod)) |
ggs_grb(ggs(mod)) |
|
I personally prefer the rstan
version of plots and thus these are the ones I will showcase.
Bayesian samplers involve many calls to randomisation functions. As a result, the estimates will vary slightly each time the routines are run. You should expect that the outputs that you obtain will differ slightly from those that I am displaying. Nevertheless, the main conclusions should remain robust across subsequent runs.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
Conclusions:
- the chains appear well mixed and very similar
Conclusions:
- there is no evidence of autocorrelation in the MCMC samples
Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
Conclusions:
- all Rhat values are below 1.05, suggesting the chains have converged.
The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Conclusions:
- ratios are all very high
Conclusions:
- all the diagnostics appear reasonable
- we can conclude that the chains are all well mixed and have converged on a stable posterior.
9 Model validation
Model validation involves exploring the model diagnostics and fit to ensure that the model is broadly appropriate for the data. As such, exploration of the residuals should be routine.
For more complex models (those that contain multiple effects, it is also advisable to plot the residuals against each of the individual predictors. For sampling designs that involve sample collection over space or time, it is also a good idea to explore whether there are any temporal or spatial patterns in the residuals.
There are numerous situations (e.g. when applying specific variance-covariance structures to a model) where raw residuals do not reflect the interior workings of the model. Typically, this is because they do not take into account the variance-covariance matrix or assume a very simple variance-covariance matrix. Since the purpose of exploring residuals is to evaluate the model, for these cases, it is arguably better to draw conclusions based on standardized (or studentized) residuals.
Unfortunately the definitions of standardised and studentised residuals appears to vary and the two terms get used interchangeably. I will adopt the following definitions:
- Standardized residuals
- the raw residuals divided by the true standard deviation of the residuals (which of course is rarely known).
- Studentized residuals
- the raw residuals divided by the standard deviation of the residuals. Note that externally studentised residuals are calculated by dividing the raw residuals by a unique standard deviation for each observation that is calculated from regressions having left each successive observation out.
- Pearson residuals
- the raw residuals divided by the standard deviation of the response variable.
The mark of a good model is being able to predict well. In an ideal world, we would have sufficiently large sample size as to permit us to hold a fraction (such as 25%) back thereby allowing us to train the model on 75% of the data and then see how well the model can predict the withheld 25%. Unfortunately, such a luxury is still rare in ecology.
The next best option is to see how well the model can predict the observed data. Models tend to struggle most with the extremes of trends and have particular issues when the extremes approach logical boundaries (such as zero for count data and standard deviations). We can use the fitted model to generate random predicted observations and then explore some properties of these compared to the actual observed data.
Package | Description | function | rstanarm | brms |
---|---|---|---|---|
bayesplot | Density overlay | ppc_dens_overlay |
pp_check(mod, plotfun='dens_overlay') |
pp_check(mod, type='dens_overlay') |
Obs vs Pred error | ppc_error_scatter_avg |
pp_check(mod, plotfun='error_scatter_avg') |
pp_check(mod, type='error_scatter_avg') |
|
Pred error vs x | ppc_error_scatter_avg_vs_x |
pp_check(mod, x=, plotfun='error_scatter_avg_vs_x') |
pp_check(mod, x=, type='error_scatter_avg_vs_x') |
|
Preds vs x | ppc_intervals |
pp_check(mod, x=, plotfun='intervals') |
pp_check(mod, x=, type='intervals') |
|
Partial plot | ppc_ribbon |
pp_check(mod, x=, plotfun='ribbon') |
pp_check(mod, x=, type='ribbon') |
|
bayesplot PPC module:
ppc_bars
ppc_bars_grouped
ppc_boxplot
ppc_dens
ppc_dens_overlay
ppc_dens_overlay_grouped
ppc_ecdf_overlay
ppc_ecdf_overlay_grouped
ppc_error_binned
ppc_error_hist
ppc_error_hist_grouped
ppc_error_scatter
ppc_error_scatter_avg
ppc_error_scatter_avg_grouped
ppc_error_scatter_avg_vs_x
ppc_freqpoly
ppc_freqpoly_grouped
ppc_hist
ppc_intervals
ppc_intervals_grouped
ppc_km_overlay
ppc_km_overlay_grouped
ppc_loo_intervals
ppc_loo_pit
ppc_loo_pit_overlay
ppc_loo_pit_qq
ppc_loo_ribbon
ppc_pit_ecdf
ppc_pit_ecdf_grouped
ppc_ribbon
ppc_ribbon_grouped
ppc_rootogram
ppc_scatter
ppc_scatter_avg
ppc_scatter_avg_grouped
ppc_stat
ppc_stat_2d
ppc_stat_freqpoly
ppc_stat_freqpoly_grouped
ppc_stat_grouped
ppc_violin_grouped
Bayesian samplers involve many calls to randomisation functions. As a result, the estimates will vary slightly each time the routines are run. You should expect that the outputs that you obtain will differ slightly from those that I am displaying. Nevertheless, the main conclusions should remain robust across subsequent runs.
resid <- resid(dat1a.brm2)[, "Estimate"]
fit <- fitted(dat1a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat1a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat$x))
Conclusions:
- there does not appear to be any pattern in the residuals
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
Density overlay
These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat1a.resids <- make_brms_dharma_res(dat1a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat1a.resids)) +
wrap_elements(~ plotResiduals(dat1a.resids)) +
wrap_elements(~ testDispersion(dat1a.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat1b.brm2)[, "Estimate"]
fit <- fitted(dat1b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat1b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat$x))
Conclusions:
- there does not appear to be any pattern in the residuals
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
Density overlay
These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat1b.resids <- make_brms_dharma_res(dat1b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat1b.resids)) +
wrap_elements(~ plotResiduals(dat1b.resids)) +
wrap_elements(~ testDispersion(dat1b.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat1c.brm2)[, "Estimate"]
fit <- fitted(dat1c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat1c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat$x))
Conclusions:
- there does not appear to be any pattern in the residuals
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
Density overlay
These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat1c.resids <- make_brms_dharma_res(dat1c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat1c.resids)) +
wrap_elements(~ plotResiduals(dat1c.resids)) +
wrap_elements(~ testDispersion(dat1c.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat2a.brm2)[, "Estimate"]
fit <- fitted(dat2a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat2a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat2$x))
Conclusions:
- there does not appear to be any pattern in the residuals
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
Density overlay
These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat2a.resids <- make_brms_dharma_res(dat2a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat2a.resids)) +
wrap_elements(~ plotResiduals(dat2a.resids)) +
wrap_elements(~ testDispersion(dat2a.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat2b.brm2)[, "Estimate"]
fit <- fitted(dat2b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat2b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat2$x))
Conclusions:
- there does not appear to be any pattern in the residuals
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
Density overlay
These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat2b.resids <- make_brms_dharma_res(dat2b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat2b.resids)) +
wrap_elements(~ plotResiduals(dat2b.resids)) +
wrap_elements(~ testDispersion(dat2b.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat3a.brm2)[, "Estimate"]
fit <- fitted(dat3a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat3a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat3$x))
conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat3a.resids <- make_brms_dharma_res(dat3a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat3a.resids)) +
wrap_elements(~ plotResiduals(dat3a.resids)) +
wrap_elements(~ testDispersion(dat3a.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat3b.brm2)[, "Estimate"]
fit <- fitted(dat3b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat3b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat3$x))
conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat3b.resids <- make_brms_dharma_res(dat3b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat3b.resids)) +
wrap_elements(~ plotResiduals(dat3b.resids)) +
wrap_elements(~ testDispersion(dat3b.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat3c.brm2)[, "Estimate"]
fit <- fitted(dat3c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat3c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat3$x))
conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat3c.resids <- make_brms_dharma_res(dat3c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat3c.resids)) +
wrap_elements(~ plotResiduals(dat3c.resids)) +
wrap_elements(~ testDispersion(dat3c.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat4a.brm2)[, "Estimate"]
fit <- fitted(dat4a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat4a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat4$x))
conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat4a.resids <- make_brms_dharma_res(dat4a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat4a.resids)) +
wrap_elements(~ plotResiduals(dat4a.resids)) +
wrap_elements(~ testDispersion(dat4a.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat4b.brm2)[, "Estimate"]
fit <- fitted(dat4b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat4b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat4$x))
conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat4b.resids <- make_brms_dharma_res(dat4b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat4b.resids)) +
wrap_elements(~ plotResiduals(dat4b.resids)) +
wrap_elements(~ testDispersion(dat4b.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat4c.brm2)[, "Estimate"]
fit <- fitted(dat4c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat4c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat4$x))
conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat4c.resids <- make_brms_dharma_res(dat4c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat4c.resids)) +
wrap_elements(~ plotResiduals(dat4c.resids)) +
wrap_elements(~ testDispersion(dat4c.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat5a.brm2)[, "Estimate"]
fit <- fitted(dat5a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat5a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat3$x))
conclusions:
- the above plots are almost impossible to interpret for binary data.
- they will always feature two curved lines (one for the zeros, the other for the ones)
- it is virtually impossible to diagnose any issues from such plots.
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
- note that these density plots are going to be too crude to be completely useful
- all the mass should be at either 0 or 1
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
- this sort of plot is of very little value for binary data
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
- this sort of plot is of very little value for binary data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
- this sort of plot is of very little value for binary data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
wrap_elements(~ testUniformity(dat5a.resids)) +
wrap_elements(~ plotResiduals(dat5a.resids, quantreg = FALSE)) +
wrap_elements(~ testDispersion(dat5a.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat5b.brm2)[, "Estimate"]
fit <- fitted(dat5b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat5b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat3$x))
conclusions:
- the above plots are almost impossible to interpret for binary data.
- they will always feature two curved lines (one for the zeros, the other for the ones)
- it is virtually impossible to diagnose any issues from such plots.
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
- note that these density plots are going to be too crude to be completely useful
- all the mass should be at either 0 or 1
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
- this sort of plot is of very little value for binary data
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
- this sort of plot is of very little value for binary data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
- this sort of plot is of very little value for binary data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
wrap_elements(~ testUniformity(dat5b.resids)) +
wrap_elements(~ plotResiduals(dat5b.resids, quantreg = FALSE)) +
wrap_elements(~ testDispersion(dat5b.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat5c.brm2)[, "Estimate"]
fit <- fitted(dat5c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat5c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat3$x))
conclusions:
- the above plots are almost impossible to interpret for binary data.
- they will always feature two curved lines (one for the zeros, the other for the ones)
- it is virtually impossible to diagnose any issues from such plots.
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
- note that these density plots are going to be too crude to be completely useful
- all the mass should be at either 0 or 1
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
- this sort of plot is of very little value for binary data
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
- this sort of plot is of very little value for binary data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
- this sort of plot is of very little value for binary data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
wrap_elements(~ testUniformity(dat5c.resids)) +
wrap_elements(~ plotResiduals(dat5c.resids, quantreg = FALSE)) +
wrap_elements(~ testDispersion(dat5c.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat6a.brm2)[, "Estimate"]
fit <- fitted(dat6a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat6a.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat6$x))
conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat6a.resids <- make_brms_dharma_res(dat6a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat6a.resids)) +
wrap_elements(~ plotResiduals(dat6a.resids)) +
wrap_elements(~ testDispersion(dat6a.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat6b.brm2)[, "Estimate"]
fit <- fitted(dat6b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat6b.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat6$x))
conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat6b.resids <- make_brms_dharma_res(dat6b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat6b.resids)) +
wrap_elements(~ plotResiduals(dat6b.resids)) +
wrap_elements(~ testDispersion(dat6b.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
resid <- resid(dat6c.brm2)[, "Estimate"]
fit <- fitted(dat6c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = fit))
We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).
resid <- resid(dat6c.brm2)[, "Estimate"]
ggplot() +
geom_point(data = NULL, aes(y = resid, x = dat6$x))
conclusions:
- there does not appear to be any pattern in the residuals
density overlay
Conclusions:
- the model draws appear to be consistent with the observed data
Error scatter
These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.
Using all posterior draws for ppc type 'error_scatter_avg' by default.
Conclusions:
- there is no obvious pattern in the residuals.
Intervals
These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
Ribbon
These are just an alternative way of expressing the interval plot.
Using all posterior draws for ppc type 'ribbon' by default.
Conclusions:
- the posterior predictions are not inconsistent with the observed data
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals()
function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
dat6c.resids <- make_brms_dharma_res(dat6c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat6c.resids)) +
wrap_elements(~ plotResiduals(dat6c.resids)) +
wrap_elements(~ testDispersion(dat6c.resids)) +
plot_layout(nrow = 1)
If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.
To address this, you can either:
- break the multi-panel figure up into four separate figures by removing the outer
wrap_elements()
andplot_layout()
functions. - copy the above code into the console and view in a larger graphics device
Conclusions:
- the Q-Q plot looks reasonable (points broadly follow the angled line)
- there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
- there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
- the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)
Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable
10 Partial effects plots
Prior to exploring the modelled numerical estimates, it is worth reviewing simple plots of the predicted trends associated with each predictor. Importantly, they typically express the trends on the scale of the response, although for some, it is possible to force the trends to be expressed on the link scale. Such plots provides a final visual check of whether the model has yielded sensible outcomes. Furthermore, they usually assist in the interpretation of the major estimated parameters.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects
is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects
is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects
is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects
is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects
is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects
is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects
is able to backtransform \(x\) onto the original scale when producing the partial plot.
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects
is able to backtransform \(x\) onto the original scale when producing the partial plot.
dat6b.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)
#OR
dat6b.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total),
spaghetti = TRUE, ndraws = 200) |>
plot(points = TRUE)
Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects
is able to backtransform \(x\) onto the original scale when producing the partial plot.
dat6c.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total)) |>
plot(points = TRUE)
#OR
dat6c.brm2 |>
conditional_effects(conditions = data.frame(total = dat6$total),
spaghetti = TRUE, ndraws = 200) |>
plot(points = TRUE)
Notice that although we had centered and scaled the predictor, because we did so in the model formula, conditional_effects
is able to backtransform \(x\) onto the original scale when producing the partial plot.
11 Model investigation
Rather than simply return point estimates of each of the model parameters, Bayesian analyses capture the full posterior of each parameter. These are typically stored within the list
structure of the output object.
As with most statistical routines, the overloaded summary()
function provides an overall summary of the model parameters. Typically, the summaries will include the means / medians along with credibility intervals and perhaps convergence diagnostics (such as R hat). However, more thorough investigation and analysis of the parameter posteriors requires access to the full posteriors.
There is currently a plethora of functions for extracting the full posteriors from models. In part, this is a reflection of a rapidly evolving space with numerous packages providing near equivalent functionality (it should also be noted, that over time, many of the functions have been deprecated due to inconsistencies in their names). Broadly speaking, the functions focus on draws from the posterior of either the parameters (intercept, slope, standard deviation etc), linear predictor, expected values or predicted values. The distinction between the latter three are highlighted in the following table.
Property | Description |
---|---|
linear predictors | values predicted on the link scale |
expected values | predictions (on response scale) without residual error (predicting expected mean outcome(s)) |
predicted values | predictions (on response scale) that incorporate residual error |
fitted values | predictions on the response scale |
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ x
Data: dat (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.27 0.38 -0.46 1.06 1.00 2134 2352
x -0.08 0.46 -0.97 0.82 1.00 2459 2330
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.12 0.36 0.66 2.07 1.00 2402 2248
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\), the expected value of \(y\) is 0.266 and we are 95% confident that the true value is between -0.461 and 1.057. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.08 units and we are 95% confident that this change is between -0.967 and 0.816- sigma is estimated to be 1.12
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat1a.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 9 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.262 -0.501 1.01 1.00 2400 2134. 2352.
2 b_x -0.0803 -0.973 0.806 1.00 2400 2459. 2330.
3 sigma 1.03 0.588 1.84 1.00 2400 2402. 2248.
4 Intercept 0.277 -0.443 1.04 1.00 2400 2085. 2348.
5 prior_Intercept 30.9 -0.201 59.8 1.00 2400 2231. 2286.
6 prior_b 0.0169 -13.1 11.0 1.00 2400 1968. 2220.
7 prior_sigma 12.0 0.00537 46.8 1.00 2400 2399. 2330.
8 lprior -11.2 -11.3 -11.1 1.00 2400 2148. 2103.
9 lp__ -25.1 -28.6 -23.7 1.00 2400 2241. 2233.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 0.262 and we are 95% confident that the true value is between -0.501 and 1.013. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.08 units and we are 95% confident that this change is between -0.973 and 0.806- sigma is estimated to be 1.03
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
dat1a.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 3 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.262 -0.501 1.01 1.00 2400 2100. 2344.
2 b_x -0.0803 -0.973 0.806 1.00 2400 2454. 2322.
3 sigma 1.03 0.588 1.84 1.00 2400 2402. 2236.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 0.262 and we are 95% confident that the true value is between -0.501 and 1.013. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.08 units and we are 95% confident that this change is between -0.973 and 0.806- sigma is estimated to be 1.03
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing.
dat1a.brm2 |>
gather_draws(b_Intercept, b_x, sigma) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.06517101 2.242159e-09 0.313515 0.95 median hdci
Conclusions:
- 6.517% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 0% and 31.352%
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ scale(x, scale = FALSE)
Data: dat (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.28 0.36 -0.40 0.98 1.00 2309 2290
scalexscaleEQFALSE -0.09 0.46 -0.97 0.85 1.00 2380 2251
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.10 0.34 0.66 1.94 1.00 2363 2409
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\) (its average since it is centered), the expected value of \(y\) is 0.283 and we are 95% confident that the true value is between -0.404 and 0.981. So \(y\) is expected to be 0.283 at the average \(x\).x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.089 units and we are 95% confident that this change is between -0.971 and 0.849- sigma is estimated to be 1.1
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat1b.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 9 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.277 -0.437 0.938 1.00 2400 2309. 2290.
2 b_scalexscaleEQFALSE -0.0841 -0.997 0.803 1.00 2400 2381. 2251.
3 sigma 1.03 0.596 1.76 1.00 2400 2364. 2409.
4 Intercept 0.277 -0.437 0.938 1.00 2400 2309. 2290.
5 prior_Intercept 31.7 1.66 60.9 1.00 2400 2219. 2150.
6 prior_b -0.0664 -12.9 13.2 1.00 2400 2408. 2328.
7 prior_sigma 11.2 0.00997 49.3 1.00 2400 2373. 2459.
8 lprior -11.2 -11.3 -11.1 1.00 2400 2447. 2293.
9 lp__ -25.0 -28.2 -23.7 1.00 2400 2069. 2035.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\) (its average since it is centered), the expected value of \(y\) is 0.277 and we are 95% confident that the true value is between -0.437 and 0.938. So \(y\) is expected to be 0.283 at the average \(x\).b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.084 units and we are 95% confident that this change is between -0.997 and 0.803- sigma is estimated to be 1.03
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
dat1b.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 3 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.277 -0.437 0.938 1.00 2400 2270. 2282.
2 b_scalexscaleEQFALSE -0.0841 -0.997 0.803 1.00 2400 2371. 2243.
3 sigma 1.03 0.596 1.76 1.00 2400 2362. 2402.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\) (its average since it is centered), the expected value of \(y\) is 0.277 and we are 95% confident that the true value is between -0.437 and 0.938. So \(y\) is expected to be 0.283 at the average \(x\).b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.084 units and we are 95% confident that this change is between -0.997 and 0.803- sigma is estimated to be 1.03
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.
[1] "b_Intercept" "b_scalexscaleEQFALSE" "sigma"
[4] "Intercept" "prior_Intercept" "prior_b"
[7] "prior_sigma" "lprior" "lp__"
[10] "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
dat1b.brm2 |>
gather_draws(`b_Intercept`,`b_.*x.*`, `sigma`, regex = TRUE) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.06623776 2.645785e-07 0.3103406 0.95 median hdci
Conclusions:
- 6.624% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 0% and 31.034%
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ scale(x)
Data: dat (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.28 0.37 -0.43 1.02 1.00 2230 1989
scalex -0.07 0.41 -0.87 0.71 1.00 2201 2292
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.14 0.37 0.65 2.01 1.00 2300 2224
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.284 and we are 95% confident that the true value is between -0.428 and 1.017. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) -0.069 units and we are 95% confident that this change is between -0.87 and 0.712- sigma is estimated to be 1.14
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat1c.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 9 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.279 -0.426 1.02 1.00 2400 2230. 1989.
2 b_scalex -0.0750 -0.813 0.752 1.00 2400 2201. 2292.
3 sigma 1.06 0.550 1.80 0.999 2400 2300. 2224.
4 Intercept 0.279 -0.426 1.02 1.00 2400 2230. 1989.
5 prior_Intercept 31.8 3.60 62.5 1.00 2400 2343. 2023.
6 prior_b -0.359 -44.6 46.7 1.00 2400 2334. 2368.
7 prior_sigma 11.6 0.0000123 47.0 1.00 2400 2421. 2290.
8 lprior -12.5 -12.6 -12.4 1.00 2400 2226. 1977.
9 lp__ -26.5 -29.6 -25.0 1.00 2400 2390. 2328.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.279 and we are 95% confident that the true value is between -0.426 and 1.019. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)-0.075 units and we are 95% confident that this change is between -0.813 and 0.752- sigma is estimated to be 1.06
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
dat1c.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 3 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.279 -0.426 1.02 1.00 2400 2235. 1985.
2 b_scalex -0.0750 -0.813 0.752 1.00 2400 2193. 2280.
3 sigma 1.06 0.550 1.80 1.00 2400 2290. 2197.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.279 and we are 95% confident that the true value is between -0.426 and 1.019. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) -0.075 units and we are 95% confident that this change is between -0.813 and 0.752- sigma is estimated to be 1.06
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.
[1] "b_Intercept" "b_scalex" "sigma" "Intercept"
[5] "prior_Intercept" "prior_b" "prior_sigma" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
dat1c.brm2 |>
gather_draws(`b_Intercept`,`b_.*x`, `sigma`, regex = TRUE) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.06728658 1.499005e-09 0.3197585 0.95 median hdci
Conclusions:
- 6.729% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 0% and 31.976%
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ x
Data: dat2 (Number of observations: 12)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 20.89 2.03 16.62 24.89 1.00 2291 2382
xmedium -0.83 2.85 -6.31 5.12 1.00 2347 2180
xhigh -8.72 3.09 -14.74 -2.34 1.00 2321 2345
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 4.48 1.11 2.86 7.18 1.00 2605 2498
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x\) is “control”, the expected value of \(y\) is 20.89 and we are 95% confident that the true value is between 16.616 and 24.891.x*
: (the slopes) - the change (effect) in \(y\) between the first (control) group unit (=1) and each other \(x\) level.xmedium
: \(y\) is (on average) 0.829 units (95% confident that this change is between -6.312 and 5.116 less in the “medium” group compared to the “control” group.xhigh
: \(y\) is (on average) 8.715 units and we are 95% confident that this change is between -14.736 and -2.344 less in the “high” group compared to the “control” group.
- sigma is estimated to be 4.48
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat2a.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 10 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 20.9 17.0 25.1 1.00 2400 2290. 2382.
2 b_xmedium -0.889 -6.44 4.89 1.00 2400 2347. 2180.
3 b_xhigh -8.73 -14.8 -2.38 1.00 2400 2320. 2345.
4 sigma 4.30 2.67 6.71 1.00 2400 2605. 2498.
5 Intercept 17.7 15.7 20.0 1.00 2400 2485. 2204.
6 prior_Intercept 19.7 16.1 23.2 1.00 2400 2516. 2283.
7 prior_b 0.0306 -18.9 20.7 1.00 2400 2384. 2381.
8 prior_sigma 3.55 0.00499 14.0 1.00 2400 2374. 2369.
9 lprior -11.4 -13.1 -10.1 1.00 2400 2361. 2414.
10 lp__ -44.3 -47.8 -42.5 1.00 2400 2231. 2367.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x\) is “control”, the expected value of \(y\) is 20.875 and we are 95% confident that the true value is between 16.993 and 25.064.x*
: (the slopes) - the change (effect) in \(y\) between the first (control) group unit (=1) and each other \(x\) level.xmedium
: \(y\) is (on average) 0.889 units (95% confident that this change is between -6.445 and 4.886 less in the “medium” group compared to the “control” group.xhigh
: \(y\) is (on average) 8.726 units and we are 95% confident that this change is between -14.756 and -2.383 less in the “high” group compared to the “control” group.
- sigma is estimated to be 4.48
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
dat2a.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 4 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 20.9 17.0 25.1 1.00 2400 2270. 2367.
2 b_xmedium -0.889 -6.44 4.89 1.00 2400 2329. 2158.
3 b_xhigh -8.73 -14.8 -2.38 1.00 2400 2294. 2322.
4 sigma 4.30 2.67 6.71 1.00 2400 2597. 2492.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x\) is “control”, the expected value of \(y\) is 20.875 and we are 95% confident that the true value is between 16.993 and 25.064.x*
: (the slopes) - the change (effect) in \(y\) between the first (control) group unit (=1) and each other \(x\) level.xmedium
: \(y\) is (on average) 0.889 units (95% confident that this change is between -6.445 and 4.886 less in the “medium” group compared to the “control” group.xhigh
: \(y\) is (on average) 8.726 units and we are 95% confident that this change is between -14.756 and -2.383 less in the “high” group compared to the “control” group.
- sigma is estimated to be 4.48
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing.
[1] "b_Intercept" "b_xmedium" "b_xhigh" "sigma"
[5] "Intercept" "prior_Intercept" "prior_b" "prior_sigma"
[9] "lprior" "lp__" "accept_stat__" "stepsize__"
[13] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
dat2a.brm2 |>
gather_draws(`b_Intercept`, `b_x.*`, `sigma`, regex = TRUE) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.5424268 0.1636153 0.7253359 0.95 median hdci
Conclusions:
- 54.243% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 16.362% and 72.534%
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ -1 + x
Data: dat2 (Number of observations: 12)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
xcontrol 20.26 2.17 15.79 24.43 1.00 2277 2411
xmedium 18.68 2.15 14.38 23.03 1.00 2406 2232
xhigh 11.10 2.23 6.67 15.83 1.00 2547 2347
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 4.45 1.14 2.82 7.24 1.00 2385 2312
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.x*
: (the group means).xcontrol
: the expected value of \(y\) in the “control” group is 2.174 (95% credibility interval is between 15.788 and 24.425)xmedium
: the expected value of \(y\) in the “control” group is 2.15 (95% credibility interval is between 14.383 and 23.026)xhigh
: the expected value of \(y\) in the “control” group is 2.231 (95% credibility interval is between 6.669 and 15.832)
- sigma is estimated to be 4.45
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat2b.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 8 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_xcontrol 20.3 15.8 24.5 0.999 2400 2278. 2411.
2 b_xmedium 18.6 14.7 23.4 1.00 2400 2406. 2232.
3 b_xhigh 11.0 6.49 15.6 1.00 2400 2547. 2347.
4 sigma 4.26 2.52 6.64 1.00 2400 2385. 2312.
5 prior_b 16.2 -6.13 36.0 0.999 2400 2363. 2497.
6 prior_sigma 3.53 0.000386 13.9 1.00 2400 2463. 2124.
7 lprior -11.8 -12.8 -11.1 1.00 2400 2174. 2293.
8 lp__ -44.6 -48.3 -42.7 1.00 2400 2347. 2301.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.x*
: (the means of each group)xcontrol
: the expected value of \(y\) in the “control” group is 20.276 (95% credibility interval is between 15.847 and 24.478)xmedium
: the expected value of \(y\) in the “control” group is 18.635 (95% credibility interval is between 14.745 and 23.371)xhigh
: the expected value of \(y\) in the “control” group is 11.028 (95% credibility interval is between 6.491 and 15.58)
- sigma is estimated to be 4.45
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
dat2b.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 4 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_xcontrol 20.3 15.8 24.5 1.00 2400 2266. 2404.
2 b_xmedium 18.6 14.7 23.4 1.00 2400 2387. 2190.
3 b_xhigh 11.0 6.49 15.6 1.00 2400 2539. 2305.
4 sigma 4.26 2.52 6.64 1.00 2400 2380. 2290.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.x*
: (the means of each group)xcontrol
: the expected value of \(y\) in the “control” group is 20.276 (95% credibility interval is between 15.847 and 24.478)xmedium
: the expected value of \(y\) in the “control” group is 18.635 (95% credibility interval is between 14.745 and 23.371)xhigh
: the expected value of \(y\) in the “control” group is 11.028 (95% credibility interval is between 6.491 and 15.58)
- sigma is estimated to be 4.45
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing.
[1] "b_xcontrol" "b_xmedium" "b_xhigh" "sigma"
[5] "prior_b" "prior_sigma" "lprior" "lp__"
[9] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__"
[13] "divergent__" "energy__"
dat2b.brm2 |>
gather_draws(`b_x.*`, `sigma`, regex = TRUE) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.5540797 0.1893543 0.7253041 0.95 median hdci
Conclusions:
- 55.408% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 18.935% and 72.53%
Family: poisson
Links: mu = log
Formula: y ~ x
Data: dat3 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.02 0.35 -0.69 0.67 1.00 2108 1861
x 0.35 0.04 0.27 0.43 1.00 2142 2040
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary()
function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\), the expected value of \(y\) is 0.021 and we are 95% confident that the true value is between -0.693 and 0.672. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.345 units and we are 95% confident that this change is between 0.266 and 0.432
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat3a.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 7 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.0368 -0.693 0.672 1.00 2400 2108. 1861.
2 b_x 0.345 0.265 0.430 1.00 2400 2141. 2040.
3 Intercept 1.93 1.63 2.18 1.00 2400 2160. 2129.
4 prior_Intercept 2.01 -0.701 4.48 1.00 2400 1982. 1809.
5 prior_b 0.0274 -6.08 6.64 1.00 2400 2399. 2209.
6 lprior -2.92 -2.96 -2.91 1.00 2400 2116. 2069.
7 lp__ -26.8 -29.1 -26.1 1.00 2400 2158. 2326.
The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 1.929 and we are 95% confident that the true value is between -0.693 and 0.672. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.345 units and we are 95% confident that this change is between 0.265 and 0.43
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
It will also use mutate()
to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat3a.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.04 0.437 1.84 1.00 2400 2103. 1798.
2 b_x 1.41 1.30 1.53 1.00 2400 2136. 1970.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 1.038 and we are 95% confident that the true value is between 0.5 and 1.958. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.411 and we are 95% confident that this change is between 1.303 and 1.538. This represents a (value -1) * 100 41.1 % increase in \(y\) per unit increase in \(x\).
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing.
dat3a.brm2 |>
gather_draws(b_Intercept, b_x) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.922404 0.8093995 0.9343621 0.95 median hdci
Conclusions:
- 92.24% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 80.94% and 93.436%
Family: poisson
Links: mu = log
Formula: y ~ scale(x, scale = FALSE)
Data: dat3 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.93 0.14 1.65 2.19 1.00 2222 2142
scalexscaleEQFALSE 0.34 0.04 0.26 0.43 1.00 2132 1947
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary()
function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\), the expected value of \(y\) is 1.926 and we are 95% confident that the true value is between 1.649 and 2.191. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.342 units and we are 95% confident that this change is between 0.257 and 0.433
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat3b.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 7 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.93 1.65 2.19 1.00 2400 2222. 2142.
2 b_scalexscaleEQFALSE 0.342 0.263 0.438 1.00 2400 2132. 1947.
3 Intercept 1.93 1.65 2.19 1.00 2400 2222. 2142.
4 prior_Intercept 2.00 -0.452 4.56 1.00 2400 2307. 2368.
5 prior_b -0.0445 -6.25 5.77 1.00 2400 2196. 2306.
6 lprior -2.92 -2.95 -2.91 1.00 2400 2217. 2187.
7 lp__ -26.8 -29.2 -26.1 1.00 2400 2241. 2284.
The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 1.928 and we are 95% confident that the true value is between 1.646 and 2.187. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.342 units and we are 95% confident that this change is between 0.263 and 0.438
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
It will also use mutate()
to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat3b.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 6.87 5.11 8.83 1.00 2400 2211. 2121.
2 b_scalexscaleEQFALSE 1.41 1.29 1.54 1.00 2400 2123. 1929.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 6.873 and we are 95% confident that the true value is between 5.188 and 8.912. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.408 and we are 95% confident that this change is between 1.301 and 1.549. This represents a (value -1) * 100 40.8 % increase in \(y\) per unit increase in \(x\).
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing.
dat3b.brm2 |>
gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.9215583 0.8041622 0.9343709 0.95 median hdci
Conclusions:
- 92.156% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 80.416% and 93.437%
Family: poisson
Links: mu = log
Formula: y ~ scale(x)
Data: dat3 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.93 0.14 1.64 2.20 1.00 2142 2163
scalex 1.03 0.13 0.78 1.30 1.00 2155 2327
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary()
function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.929 and we are 95% confident that the true value is between 1.643 and 2.195. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) 1.035 units and we are 95% confident that this change is between 0.782 and 1.305
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat3c.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 7 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.93 1.66 2.22 1.00 2400 2141. 2163.
2 b_scalex 1.03 0.768 1.28 1.00 2400 2155. 2327.
3 Intercept 1.93 1.66 2.22 1.00 2400 2141. 2163.
4 prior_Intercept 2.00 -0.830 4.47 1.00 2400 2402. 2329.
5 prior_b 0.00164 -4.98 3.99 1.00 2400 2195. 2105.
6 lprior -2.86 -3.04 -2.70 1.00 2400 2139. 2328.
7 lp__ -26.8 -28.9 -26.0 1.00 2400 2190. 2273.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.932 and we are 95% confident that the true value is between 1.665 and 2.218. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)1.034 units and we are 95% confident that this change is between 0.768 and 1.279
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
It will also use mutate()
to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat3c.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 6.90 5.05 8.85 1.00 2400 2126. 2156.
2 b_scalex 2.81 2.13 3.55 1.00 2400 2121. 2322.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 6.903 and we are 95% confident that the true value is between 5.285 and 9.185. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) a factor of 2.812 units and we are 95% confident that this change is between 2.156 and 3.593. This represents a ((value -1) * 100) 181.2% increase in \(y\) per unit increase in \(x\).
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.
[1] "b_Intercept" "b_scalex" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
dat3c.brm2 |>
gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.9211877 0.8045365 0.9343754 0.95 median hdci
Conclusions:
- 92.119% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 80.454% and 93.438%
Family: negbinomial
Links: mu = log; shape = identity
Formula: y ~ x
Data: dat4 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.36 0.40 -0.43 1.11 1.00 2545 2355
x 0.28 0.05 0.18 0.39 1.00 2474 2236
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 52.46 64.80 3.26 229.15 1.00 2142 2369
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary()
function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\), the expected value of \(y\) is 0.357 and we are 95% confident that the true value is between -0.434 and 1.11. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.284 units and we are 95% confident that this change is between 0.183 and 0.393
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat4a.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 9 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 3.66e- 1 -3.65e- 1 1.16 1.00 2400 2545. 2355.
2 b_x 2.83e- 1 1.82e- 1 0.390 1.00 2400 2475. 2236.
3 shape 3.04e+ 1 1.09e+ 0 179. 1.00 2400 2143. 2369.
4 Intercept 1.92e+ 0 1.62e+ 0 2.25 1.00 2400 2559. 2144.
5 prior_Intercept 1.97e+ 0 -1.75e- 1 3.74 1.00 2400 2326. 2298.
6 prior_b 2.13e- 2 -4.67e+ 0 4.70 1.00 2400 2275. 2180.
7 prior_shape 2.15e-28 1.56e-305 0.250 1.00 2400 2329. 2291.
8 lprior -1.07e+ 1 -1.42e+ 1 -8.01 1.00 2400 2137. 2369.
9 lp__ -3.18e+ 1 -3.49e+ 1 -30.6 0.999 2400 2105. 2325.
The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 30.415 and we are 95% confident that the true value is between -0.365 and 1.162. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.283 units and we are 95% confident that this change is between 0.182 and 0.39
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
It will also use mutate()
to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat4a.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.44 0.568 2.83 1.00 2400 2533. 2330.
2 b_x 1.33 1.20 1.48 1.00 2400 2465. 2218.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 1.442 and we are 95% confident that the true value is between 0.694 and 3.196. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.327 and we are 95% confident that this change is between 1.2 and 1.477. This represents a (value -1) * 100 32.7 % increase in \(y\) per unit increase in \(x\).
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing.
dat4a.brm2 |>
gather_draws(b_Intercept, b_x) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.9003317 0.6621915 0.9214525 0.95 median hdci
Conclusions:
- 90.033% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 66.219% and 92.145%
Family: negbinomial
Links: mu = log; shape = identity
Formula: y ~ scale(x, scale = FALSE)
Data: dat4 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.92 0.15 1.61 2.21 1.00 2358 2422
scalexscaleEQFALSE 0.28 0.05 0.18 0.40 1.00 2376 2327
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 49.96 59.36 3.36 203.02 1.00 2163 1994
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary()
function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\), the expected value of \(y\) is 1.918 and we are 95% confident that the true value is between 1.608 and 2.214. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.285 units and we are 95% confident that this change is between 0.18 and 0.396
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat4b.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 9 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.92e+ 0 1.62 2.22 1.00 2400 2358. 2422.
2 b_scalexscaleEQFALSE 2.84e- 1 0.176 0.391 1.00 2400 2376. 2327.
3 shape 2.93e+ 1 0.992 164. 0.999 2400 2163. 1994.
4 Intercept 1.92e+ 0 1.62 2.22 1.00 2400 2358. 2422.
5 prior_Intercept 2.06e+ 0 0.190 3.91 1.00 2400 2310. 2313.
6 prior_b 1.04e- 2 -4.52 4.81 1.00 2400 2340. 2167.
7 prior_shape 1.07e-28 0 0.325 1.00 2400 1967. 2130.
8 lprior -1.05e+ 1 -13.9 -7.90 0.999 2400 2151. 1995.
9 lp__ -3.17e+ 1 -34.6 -30.5 1.00 2400 2377. 2328.
The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 29.253 and we are 95% confident that the true value is between 1.617 and 2.221. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.284 units and we are 95% confident that this change is between 0.176 and 0.391
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
It will also use mutate()
to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat4b.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 6.84 4.87 8.96 1.00 2400 2338. 2403.
2 b_scalexscaleEQFALSE 1.33 1.19 1.48 1.00 2400 2366. 2320.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 6.839 and we are 95% confident that the true value is between 5.04 and 9.217. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.328 and we are 95% confident that this change is between 1.193 and 1.479. This represents a (value -1) * 100 32.8 % increase in \(y\) per unit increase in \(x\).
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing.
dat4b.brm2 |>
gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.8982169 0.670107 0.9214525 0.95 median hdci
Conclusions:
- 89.822% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 67.011% and 92.145%
Family: negbinomial
Links: mu = log; shape = identity
Formula: y ~ scale(x)
Data: dat4 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.93 0.16 1.60 2.24 1.00 2358 2298
scalex 0.84 0.16 0.53 1.16 1.00 2470 2238
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 49.05 56.26 3.31 207.81 1.00 2285 2062
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary()
function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.927 and we are 95% confident that the true value is between 1.601 and 2.238. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) 0.842 units and we are 95% confident that this change is between 0.531 and 1.161
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat4c.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 9 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.93e+ 0 1.58 2.21 1.00 2400 2359. 2298.
2 b_scalex 8.39e- 1 0.521 1.14 1.00 2400 2471. 2238.
3 shape 2.99e+ 1 0.934 161. 1.00 2400 2285. 2062.
4 Intercept 1.93e+ 0 1.58 2.21 1.00 2400 2359. 2298.
5 prior_Intercept 2.07e+ 0 0.280 3.79 1.00 2400 2361. 2244.
6 prior_b -2.50e- 2 -4.69 4.45 1.00 2400 2327. 2298.
7 prior_shape 6.04e-29 0 0.246 1.00 2400 2451. 2350.
8 lprior -1.08e+ 1 -13.9 -7.91 1.00 2400 2276. 2123.
9 lp__ -3.18e+ 1 -34.9 -30.7 1.00 2400 1955. 2287.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.927 and we are 95% confident that the true value is between 1.584 and 2.208. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)0.839 units and we are 95% confident that this change is between 0.521 and 1.143
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
It will also use mutate()
to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat4c.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 6.87 4.84 9.05 1.00 2400 2353. 2230.
2 b_scalex 2.31 1.67 3.12 1.00 2400 2466. 2173.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 6.868 and we are 95% confident that the true value is between 4.873 and 9.096. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) a factor of 2.314 units and we are 95% confident that this change is between 1.684 and 3.135. This represents a ((value -1) * 100) 131.4% increase in \(y\) per unit increase in \(x\).
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.
[1] "b_Intercept" "b_scalex" "shape" "Intercept"
[5] "prior_Intercept" "prior_b" "prior_shape" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
dat4c.brm2 |>
gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.8982864 0.6326737 0.9214458 0.95 median hdci
Conclusions:
- 89.829% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 63.267% and 92.145%
Family: binomial
Links: mu = logit
Formula: y | trials(1) ~ x
Data: dat5 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -5.60 3.07 -13.48 -0.98 1.00 2231 2230
x 1.11 0.56 0.26 2.50 1.00 2230 2157
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary()
function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\), the expected value of \(y\) is -5.602 and we are 95% confident that the true value is between -13.478 and -0.977. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 1.114 units and we are 95% confident that this change is between 0.256 and 2.499
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat5a.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 7 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept -5.16 -11.7 -0.0398 1.00 2400 2231. 2230.
2 b_x 1.02 0.180 2.33 1.00 2400 2230. 2157.
3 Intercept 0.518 -0.814 2.00 1.00 2400 2385. 2404.
4 prior_Intercept 0.0137 -1.91 1.96 1.00 2400 2296. 2256.
5 prior_b -0.0190 -3.17 2.75 1.00 2400 2424. 2410.
6 lprior -2.84 -4.85 -1.92 1.00 2400 2241. 2190.
7 lp__ -6.01 -8.56 -5.26 1.00 2400 2125. 2339.
The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 0.518 and we are 95% confident that the true value is between -11.682 and -0.04. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 1.02 units and we are 95% confident that this change is between 0.18 and 2.335
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
It will also use mutate()
to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat5a.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.00572 0.00000000196 0.251 1.00 2400 2214. 2206.
2 b_x 2.77 0.959 8.85 1.00 2400 2218. 2145.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 0.006 and we are 95% confident that the true value is between 0 and 0.961. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 2.772 and we are 95% confident that this change is between 1.197 and 10.328. This represents a (value -1) * 100 177.2 % increase in \(y\) per unit increase in \(x\).
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing.
dat5a.brm2 |>
gather_draws(b_Intercept, b_x) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.6241225 0.2563045 0.6997611 0.95 median hdci
Conclusions:
- 62.412% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 25.63% and 69.976%
Family: binomial
Links: mu = logit
Formula: y | trials(1) ~ scale(x, scale = FALSE)
Data: dat5 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.52 0.73 -0.83 1.97 1.00 2356 2269
scalexscaleEQFALSE 1.13 0.59 0.24 2.57 1.00 2010 1941
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary()
function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\), the expected value of \(y\) is 0.523 and we are 95% confident that the true value is between -0.834 and 1.971. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 1.13 units and we are 95% confident that this change is between 0.242 and 2.575
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat5b.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 7 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.515 -0.825 1.98 1.00 2400 2356. 2269.
2 b_scalexscaleEQFALSE 1.03 0.150 2.34 1.00 2400 2010. 1941.
3 Intercept 0.515 -0.825 1.98 1.00 2400 2356. 2269.
4 prior_Intercept 0.00929 -1.91 1.95 1.00 2400 2481. 2152.
5 prior_b -0.0000282 -2.88 3.74 0.999 2400 2327. 2192.
6 lprior -2.86 -4.78 -1.93 1.00 2400 1983. 2062.
7 lp__ -6.04 -8.77 -5.26 1.00 2400 2053. 2202.
The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 0.515 and we are 95% confident that the true value is between -0.825 and 1.977. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 1.026 units and we are 95% confident that this change is between 0.15 and 2.337
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
It will also use mutate()
to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat5b.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.67 0.226 5.70 1.00 2400 2345. 2253.
2 b_scalexscaleEQFALSE 2.79 0.869 9.24 1.00 2400 1997. 1918.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 1.674 and we are 95% confident that the true value is between 0.438 and 7.22. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 2.789 and we are 95% confident that this change is between 1.162 and 10.351. This represents a (value -1) * 100 178.9 % increase in \(y\) per unit increase in \(x\).
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing.
dat5b.brm2 |>
gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.6238714 0.2277985 0.7055103 0.95 median hdci
Conclusions:
- 62.387% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 22.78% and 70.551%
Family: binomial
Links: mu = logit
Formula: y | trials(1) ~ scale(x)
Data: dat5 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.42 0.68 -0.83 1.82 1.00 2297 2232
scalex 2.08 1.28 0.22 5.17 1.00 2331 2289
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary()
function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.424 and we are 95% confident that the true value is between -0.831 and 1.817. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) 2.079 units and we are 95% confident that this change is between 0.215 and 5.17
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat5c.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 7 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.409 -0.899 1.73 1.00 2400 2297. 2232.
2 b_scalex 1.84 0.0104 4.69 1.00 2400 2331. 2289.
3 Intercept 0.409 -0.899 1.73 1.00 2400 2297. 2232.
4 prior_Intercept -0.0150 -2.01 1.96 1.00 2400 2303. 2410.
5 prior_b -0.0234 -3.46 2.82 1.00 2400 2384. 2367.
6 lprior -3.73 -6.57 -1.92 1.00 2400 2362. 2214.
7 lp__ -7.42 -10.1 -6.56 1.00 2400 2418. 2289.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.409 and we are 95% confident that the true value is between -0.899 and 1.734. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)1.844 units and we are 95% confident that this change is between 0.01 and 4.691
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
It will also use mutate()
to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat5c.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.51 0.176 4.75 1.00 2400 2297. 2198.
2 b_scalex 6.32 0.767 94.0 1.00 2400 2312. 2283.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.506 and we are 95% confident that the true value is between 0.407 and 5.663. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) a factor of 6.32 units and we are 95% confident that this change is between 1.01 and 108.987. This represents a ((value -1) * 100) 532% increase in \(y\) per unit increase in \(x\).
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.
[1] "b_Intercept" "b_scalex" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
dat5c.brm2 |>
gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.4896796 0.04655454 0.6721203 0.95 median hdci
Conclusions:
- 48.968% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 4.655% and 67.212%
Family: binomial
Links: mu = logit
Formula: count | trials(total) ~ x
Data: dat6 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -3.26 0.94 -5.21 -1.56 1.00 2503 2367
x 0.65 0.17 0.36 1.00 1.00 2470 2347
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary()
function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\), the expected value of \(y\) is -3.262 and we are 95% confident that the true value is between -5.207 and -1.561. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.65 units and we are 95% confident that this change is between 0.355 and 0.997
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat6a.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 7 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept -3.20 -5.11 -1.52 1.00 2400 2503. 2367.
2 b_x 0.640 0.346 0.982 1.00 2400 2470. 2347.
3 Intercept 0.317 -0.163 0.791 1.00 2400 2453. 2543.
4 prior_Intercept -0.233 -0.869 0.402 1.00 2400 2390. 2225.
5 prior_b -0.00697 -1.65 1.70 0.999 2400 2442. 2287.
6 lprior -2.37 -5.29 -0.417 1.00 2400 2447. 2412.
7 lp__ -14.4 -16.8 -13.7 1.00 2400 2354. 2168.
The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 0.317 and we are 95% confident that the true value is between -5.111 and -1.516. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.64 units and we are 95% confident that this change is between 0.346 and 0.982
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
It will also use mutate()
to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat6a.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.0406 0.000717 0.164 1.00 2400 2491. 2362.
2 b_x 1.90 1.37 2.62 1.00 2400 2459. 2329.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 0.041 and we are 95% confident that the true value is between 0.006 and 0.22. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.896 and we are 95% confident that this change is between 1.414 and 2.67. This represents a (value -1) * 100 89.6 % increase in \(y\) per unit increase in \(x\).
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing.
dat6a.brm2 |>
gather_draws(b_Intercept, b_x) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.7603549 0.5651125 0.8234649 0.95 median hdci
Conclusions:
- 76.035% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 56.511% and 82.346%
Family: binomial
Links: mu = logit
Formula: count | trials(total) ~ scale(x, scale = FALSE)
Data: dat6 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.31 0.25 -0.19 0.80 1.00 2202 2411
scalexscaleEQFALSE 0.66 0.17 0.37 1.02 1.00 2344 2208
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary()
function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\), the expected value of \(y\) is 0.308 and we are 95% confident that the true value is between -0.191 and 0.797. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.659 units and we are 95% confident that this change is between 0.367 and 1.024
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat6b.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 7 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.310 -0.174 0.805 1.00 2400 2202. 2411.
2 b_scalexscaleEQFALSE 0.647 0.366 1.01 1.00 2400 2344. 2208.
3 Intercept 0.310 -0.174 0.805 1.00 2400 2202. 2411.
4 prior_Intercept -0.218 -0.864 0.403 1.00 2400 2308. 2427.
5 prior_b 0.00307 -1.83 1.59 1.00 2400 2471. 2498.
6 lprior -2.34 -5.30 -0.580 1.00 2400 2316. 2445.
7 lp__ -14.4 -16.9 -13.7 1.00 2400 2339. 2368.
The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 0.31 and we are 95% confident that the true value is between -0.174 and 0.805. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.647 units and we are 95% confident that this change is between 0.366 and 1.015
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
It will also use mutate()
to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat6b.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.36 0.730 2.07 1.00 2400 2178. 2383.
2 b_scalexscaleEQFALSE 1.91 1.37 2.65 1.00 2400 2336. 2153.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\), the expected value of \(y\) is 1.363 and we are 95% confident that the true value is between 0.84 and 2.236. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.91 and we are 95% confident that this change is between 1.442 and 2.759. This represents a (value -1) * 100 91 % increase in \(y\) per unit increase in \(x\).
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing.
dat6b.brm2 |>
gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.7621002 0.5838024 0.8209675 0.95 median hdci
Conclusions:
- 76.21% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 58.38% and 82.097%
Family: binomial
Links: mu = logit
Formula: count | trials(total) ~ scale(x)
Data: dat6 (Number of observations: 10)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.28 0.25 -0.19 0.76 1.00 2332 1948
scalex 1.62 0.51 0.70 2.72 1.00 2297 2293
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results presented by the summary()
function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.28 and we are 95% confident that the true value is between -0.189 and 0.765. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) 1.622 units and we are 95% confident that this change is between 0.701 and 2.721
Note, the estimates are means and quantiles.
As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.
In the following, I am nominating that I want to summarise each parameter posterior by:
- the median
- the 95% highest probability density interval (credibility interval)
- Rhat
- total number of draws
- bulk and tail effective sample sizes
dat6c.brm2 |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
# A tibble: 7 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.279 -0.189 0.765 1.00 2400 2332. 1948.
2 b_scalex 1.61 0.642 2.59 0.999 2400 2298. 2293.
3 Intercept 0.279 -0.189 0.765 1.00 2400 2332. 1948.
4 prior_Intercept -0.234 -0.855 0.410 1.00 2400 2359. 2366.
5 prior_b -0.00711 -0.923 1.01 1.00 2400 2094. 2290.
6 lprior -5.26 -8.95 -2.08 1.00 2400 2217. 2188.
7 lp__ -17.9 -20.3 -17.1 0.999 2400 2155. 2148.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.279 and we are 95% confident that the true value is between -0.189 and 0.765. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)1.609 units and we are 95% confident that this change is between 0.642 and 2.59
As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()
), we can then use the various tidyverse
functions to focus on just the most ecologically interpretable parameters before summarising.
In the following, I will use select()
with a regex (regular expression) to match only the columns that:
- start with (
^
) “b_” followed by any amount of (*
) any character (.
) - start with (
^
) “sigma”
It will also use mutate()
to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.
dat6c.brm2 |>
brms::as_draws_df() |>
dplyr::select(matches("^b_.*|^sigma")) |>
mutate(across(everything(), exp)) |>
posterior::summarise_draws(
median,
HDInterval::hdi,
rhat,
length,
ess_bulk, ess_tail
)
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.32 0.756 2.07 1.00 2400 2318. 1883.
2 b_scalex 5.00 1.47 12.3 1.00 2400 2272. 2260.
Conclusions:
- in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
- the
Rhat
values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains Bulk_ESS
: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliableTail_ESS
: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.b_Intercept
: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.322 and we are 95% confident that the true value is between 0.828 and 2.149. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.b_x
: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) a factor of 4.999 units and we are 95% confident that this change is between 1.901 and 13.335. This represents a ((value -1) * 100) 399.9% increase in \(y\) per unit increase in \(x\).
The gather_draws()
function performs the equivalent of an as_draws_df()
followed by a pivot_longer()
in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.
[1] "b_Intercept" "b_scalex" "Intercept" "prior_Intercept"
[5] "prior_b" "lprior" "lp__" "accept_stat__"
[9] "stepsize__" "treedepth__" "n_leapfrog__" "divergent__"
[13] "energy__"
dat6c.brm2 |>
gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |>
mutate(across(everything(), exp)) |>
ggplot() +
geom_histogram(aes(x = .value)) +
facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternatively, there are various other representations supported by the ggdist
package.
y ymin ymax .width .point .interval
1 0.710588 0.3429293 0.8207865 0.95 median hdci
Conclusions:
- 71.059% of the total variability in \(y\) can be explained by its relationship to \(x\)
- we are 95% confident that the strength of this relationship is between 34.293% and 82.079%
12 Predictions
Whilst linear models are useful for estimating effects (relative differences), because they are low dimensional (only focus on a small number of covariates) they are not good at absolute predictions. Nevertheless, predicting values from linear models provides the basis for investigating/estimating additional effects and generating various graphics to visualise the estimates.
There are a large number of candidate routines for performing prediction. We will go through some of these. It is worth noting that in this context prediction is technically the act of estimating what we expect to get if we were to collect a single new observation from a particular population (e.g. a specific level of fertilizer concentration). Often this is not what we want. Often we want the fitted values - estimates of what we expect to get if we were to collect multiple new observations and average them.
So while fitted values represent the expected underlying processes occurring in the system, predicted values represent our expectations from sampling from such processes.
Package | Function | Description | Summarise with |
---|---|---|---|
emmeans |
emmeans |
Estimated marginal means from which posteriors can be drawn (via tidy_draws or gather_emmeans_draws() ) |
median_hdci() |
rstantools |
posterior_predict |
Draw from the posterior of a prediction (includes sigma) - predicts single observations | summarise_draws() |
rstantools |
posterior_linpred |
Draw from the posterior of the fitted values (on the link scale) - predicts average observations | summarise_draws() |
rstantools |
posterior_epred |
Draw from the posterior of the fitted values (on the response scale) - predicts average observations | summarise_draws() |
tidybayes |
predicted_draws |
Extract the posterior of prediction values | median_hdci() |
tidybayes |
epred_draws |
Extract the posterior of expected values | median_hdci() |
tidybayes |
fitted_draws |
median_hdci() |
|
tidybayes |
add_predicted_draws |
Adds draws from the posterior of predictions to a data frame (of prediction data) | median_hdci() |
tidybayes |
add_fitted_draws |
Adds draws from the posterior of fitted values to a data frame (of prediction data) | median_hdci() |
For simple models prediction is essentially taking the model formula complete with parameter (coefficient) estimates and solving for new values of the predictor. To explore this, we will use the fitted model to predict Yield for a Fertilizer concentration of 110.
We will therefore start by establishing this prediction domain as a data frame to use across all of the prediction routines.
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 0.0595 -2.59 2.55
5.0 -0.1437 -4.81 4.67
Point estimate displayed: median
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat1a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.0595 -2.59 2.55 0.95 median hdci
2 5 -0.144 -4.81 4.67 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.06
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.144
- 95% HPD intervals also given
dat1a.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0813 -3.54 3.38
2 ...2 -0.138 -5.86 4.89
# Or for even more control and ability to add other summaries
dat1a.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 0.100 -3.45 3.36
2 5 2 .prediction -0.168 -5.27 5.11
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.025
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.207
- 95% HPD intervals also given
dat1a.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0595 -2.59 2.55
2 ...2 -0.144 -4.81 4.67
# Or for even more control and ability to add other summaries
dat1a.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 0.0595 -2.59 2.55
2 5 2 .epred -0.144 -4.81 4.67
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.06
- the fitted mean \(y\) associated with an \(x\) of 5 is -0.144
- 95% HPD intervals also given
dat1a.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0595 -2.59 2.55
2 ...2 -0.144 -4.81 4.67
# Or for even more control and ability to add other summaries
dat1a.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 0.0595 -2.59 2.55
2 5 2 .linpred -0.144 -4.81 4.67
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.06
- the fitted mean \(y\) associated with an \(x\) of 5 is -0.144
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 0.0433 -2.59 2.51
5.0 -0.1834 -4.86 4.57
Point estimate displayed: median
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat1b.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.0433 -2.59 2.51 0.95 median hdci
2 5 -0.183 -4.86 4.57 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.043
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.183
- 95% HPD intervals also given
dat1b.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0305 -3.42 3.56
2 ...2 -0.206 -5.36 5.40
# Or for even more control and ability to add other summaries
dat1b.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 0.0336 -3.33 3.44
2 5 2 .prediction -0.177 -5.31 5.11
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.021
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.205
- 95% HPD intervals also given
dat1b.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0433 -2.59 2.51
2 ...2 -0.183 -4.86 4.57
# Or for even more control and ability to add other summaries
dat1b.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 0.0433 -2.59 2.51
2 5 2 .epred -0.183 -4.86 4.57
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.043
- the fitted mean \(y\) associated with an \(x\) of 5 is -0.183
- 95% HPD intervals also given
y ymin ymax .width .point .interval
1 -0.05408044 -4.058454 3.885879 0.95 median hdci
# Or for even more control and ability to add other summaries
dat1b.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 0.0433 -2.59 2.51
2 5 2 .linpred -0.183 -4.86 4.57
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is -4.058
- the fitted mean \(y\) associated with an \(x\) of 5 is NA
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 0.037 -2.48 2.74
5.0 -0.181 -5.10 4.80
Point estimate displayed: median
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat1c.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.0370 -2.48 2.74 0.95 median hdci
2 5 -0.181 -5.06 4.85 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.037
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.181
- 95% HPD intervals also given
dat1c.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0862 -3.68 3.66
2 ...2 -0.197 -6.05 5.08
# Or for even more control and ability to add other summaries
dat1c.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 0.0852 -3.74 3.54
2 5 2 .prediction -0.178 -5.26 6.03
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.096
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.143
- 95% HPD intervals also given
dat1c.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0370 -2.48 2.74
2 ...2 -0.181 -5.10 4.80
# Or for even more control and ability to add other summaries
dat1c.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 0.0370 -2.48 2.74
2 5 2 .epred -0.181 -5.10 4.80
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.037
- the fitted mean \(y\) associated with an \(x\) of 5 is -0.181
- 95% HPD intervals also given
y ymin ymax .width .point .interval
1 -0.04678397 -4.02588 4.194853 0.95 median hdci
# Or for even more control and ability to add other summaries
dat1c.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 0.0370 -2.48 2.74
2 5 2 .linpred -0.181 -5.10 4.80
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is -4.026
- the fitted mean \(y\) associated with an \(x\) of 5 is NA
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is “control”, “medium” and “high”
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
control 20.9 16.99 25.1
medium 20.0 16.08 24.2
high 12.1 7.95 16.4
Point estimate displayed: median
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat2a.brm2 |>
emmeans(~x) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 3 × 7
x .value .lower .upper .width .point .interval
<fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 control 20.9 17.0 25.1 0.95 median hdci
2 medium 20.0 16.3 24.4 0.95 median hdci
3 high 12.1 7.95 16.4 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.875
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 20.007
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 12.094
- 95% HPD intervals also given
dat2a.brm2 |>
posterior_predict(newdata = data.frame(x =
c("control", "medium", "high"))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 3 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 20.9 10.3 30.6
2 ...2 19.9 10.3 30.0
3 ...3 11.9 2.17 22.7
# Or for even more control and ability to add other summaries
dat2a.brm2 |>
add_predicted_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 3 × 6
# Groups: x, .row [3]
x .row variable median lower upper
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 control 1 .prediction 20.7 11.1 31.4
2 high 3 .prediction 12.1 2.93 23.4
3 medium 2 .prediction 20.0 10.8 30.2
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.925
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 19.809
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 12.089
- 95% HPD intervals also given
dat2a.brm2 |>
posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 3 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 20.9 17.0 25.1
2 ...2 20.0 16.1 24.2
3 ...3 12.1 7.95 16.4
# Or for even more control and ability to add other summaries
dat2a.brm2 |>
add_epred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 3 × 6
# Groups: x, .row [3]
x .row variable median lower upper
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 control 1 .epred 20.9 17.0 25.1
2 high 3 .epred 12.1 7.95 16.4
3 medium 2 .epred 20.0 16.1 24.2
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.875
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 20.007
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 12.094
- 95% HPD intervals also given
dat2a.brm2 |>
posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |>
median_hdci()
y ymin ymax .width .point .interval
1 19.13564 9.279738 24.10563 0.95 median hdci
# Or for even more control and ability to add other summaries
dat2a.brm2 |>
add_linpred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 3 × 6
# Groups: x, .row [3]
x .row variable median lower upper
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 control 1 .linpred 20.9 17.0 25.1
2 high 3 .linpred 12.1 7.95 16.4
3 medium 2 .linpred 20.0 16.1 24.2
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 9.28
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is NA
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is NA
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is “control”, “medium” and “high”
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
control 20.3 15.85 24.5
medium 18.6 14.74 23.4
high 11.0 6.49 15.6
Point estimate displayed: median
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat2b.brm2 |>
emmeans(~x) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 3 × 7
x .value .lower .upper .width .point .interval
<fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 control 20.3 15.8 24.5 0.95 median hdci
2 medium 18.6 14.5 23.1 0.95 median hdci
3 high 11.0 6.29 15.5 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.276
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 18.635
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 11.028
- 95% HPD intervals also given
dat2b.brm2 |>
posterior_predict(newdata = data.frame(x =
c("control", "medium", "high"))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 3 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 20.1 9.96 29.8
2 ...2 18.8 8.45 29.2
3 ...3 11.1 0.795 21.6
# Or for even more control and ability to add other summaries
dat2b.brm2 |>
add_predicted_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 3 × 6
# Groups: x, .row [3]
x .row variable median lower upper
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 control 1 .prediction 20.5 10.6 30.7
2 high 3 .prediction 11.0 1.68 21.9
3 medium 2 .prediction 18.8 9.04 29.4
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.363
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 18.851
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 11.024
- 95% HPD intervals also given
dat2b.brm2 |>
posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 3 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 20.3 15.8 24.5
2 ...2 18.6 14.7 23.4
3 ...3 11.0 6.49 15.6
# Or for even more control and ability to add other summaries
dat2b.brm2 |>
add_epred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 3 × 6
# Groups: x, .row [3]
x .row variable median lower upper
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 control 1 .epred 20.3 15.8 24.5
2 high 3 .epred 11.0 6.49 15.6
3 medium 2 .epred 18.6 14.7 23.4
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.276
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 18.635
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 11.028
- 95% HPD intervals also given
dat2b.brm2 |>
posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |>
median_hdci()
y ymin ymax .width .point .interval
1 17.98865 8.309788 23.74624 0.95 median hdci
# Or for even more control and ability to add other summaries
dat2b.brm2 |>
add_linpred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 3 × 6
# Groups: x, .row [3]
x .row variable median lower upper
<chr> <int> <chr> <dbl> <dbl> <dbl>
1 control 1 .linpred 20.3 15.8 24.5
2 high 3 .linpred 11.0 6.49 15.6
3 medium 2 .linpred 18.6 14.7 23.4
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 8.31
- the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is NA
- the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is NA
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x rate lower.HPD upper.HPD
2.5 2.45 1.41 3.73
5.0 5.79 4.07 7.56
Point estimate displayed: median
Results are back-transformed from the log scale
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat3a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
mutate(.value = exp(.value)) |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 2.45 1.41 3.73 0.95 median hdci
2 5 5.79 4.07 7.56 0.95 median hdci
# OR with yet more control over the way posteriors are summarised
dat3a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 5
# Groups: x [2]
x variable median lower upper
<dbl> <chr> <dbl> <dbl> <dbl>
1 2.5 .value 0.894 0.391 1.36
2 5 .value 1.76 1.42 2.04
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2.445
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 5.794
- 95% HPD intervals also given
dat3a.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2 0 6
2 ...2 5 1 10
# Or for even more control and ability to add other summaries
dat3a.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 2 0 6
2 5 2 .prediction 5.5 1 10
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
- 95% HPD intervals also given
dat3a.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.45 1.41 3.73
2 ...2 5.79 4.07 7.56
# Or for even more control and ability to add other summaries
dat3a.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 2.45 1.41 3.73
2 5 2 .epred 5.79 4.07 7.56
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.445
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.794
- 95% HPD intervals also given
dat3a.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
exp() |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.45 1.41 3.73
2 ...2 5.79 4.07 7.56
# Or for even more control and ability to add other summaries
dat3a.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = exp(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 2.45 1.41 3.73
2 5 2 .linpred 5.79 4.07 7.56
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.445
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.794
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 0.901 0.377 1.37
5.0 1.757 1.450 2.05
Point estimate displayed: median
Results are given on the log (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat3b.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.901 0.388 1.38 0.95 median hdci
2 5 1.76 1.45 2.06 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.901
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1.757
- 95% HPD intervals also given
dat3b.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2 0 5
2 ...2 6 1 10
# Or for even more control and ability to add other summaries
dat3b.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 2 0 6
2 5 2 .prediction 6 0 10
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
- 95% HPD intervals also given
dat3b.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.46 1.40 3.84
2 ...2 5.80 4.26 7.79
# Or for even more control and ability to add other summaries
dat3b.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 2.46 1.40 3.84
2 5 2 .epred 5.80 4.26 7.79
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.463
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.795
- 95% HPD intervals also given
dat3b.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
exp() |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.46 1.40 3.84
2 ...2 5.80 4.26 7.79
# Or for even more control and ability to add other summaries
dat3b.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = exp(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 2.46 1.40 3.84
2 5 2 .linpred 5.80 4.26 7.79
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.463
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.795
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 0.91 0.421 1.40
5.0 1.76 1.442 2.06
Point estimate displayed: median
Results are given on the log (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat3c.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.910 0.421 1.40 0.95 median hdci
2 5 1.76 1.44 2.06 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.91
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1.761
- 95% HPD intervals also given
dat3c.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2 0 6
2 ...2 6 1 10
# Or for even more control and ability to add other summaries
dat3c.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 2 0 6
2 5 2 .prediction 6 1 11
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
- 95% HPD intervals also given
dat3c.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.48 1.45 3.88
2 ...2 5.82 4.17 7.76
# Or for even more control and ability to add other summaries
dat3c.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 2.48 1.45 3.88
2 5 2 .epred 5.82 4.17 7.76
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.483
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.818
- 95% HPD intervals also given
dat3c.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
exp() |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.48 1.45 3.88
2 ...2 5.82 4.17 7.76
# Or for even more control and ability to add other summaries
dat3c.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = exp(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 2.48 1.45 3.88
2 5 2 .linpred 5.82 4.17 7.76
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.483
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.818
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x prob lower.HPD upper.HPD
2.5 2.93 1.50 4.68
5.0 5.92 4.02 8.11
Point estimate displayed: median
Results are back-transformed from the log scale
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat4a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
mutate(.value = exp(.value)) |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 2.93 1.50 4.68 0.95 median hdci
2 5 5.92 4.02 8.11 0.95 median hdci
# OR with yet more control over the way posteriors are summarised
dat4a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 5
# Groups: x [2]
x variable median lower upper
<dbl> <chr> <dbl> <dbl> <dbl>
1 2.5 .value 1.07 0.563 1.63
2 5 .value 1.78 1.46 2.14
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2.928
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 5.92
- 95% HPD intervals also given
dat4a.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 3 0 7
2 ...2 6 0 11
# Or for even more control and ability to add other summaries
dat4a.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 3 0 7
2 5 2 .prediction 6 0 11
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 3
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
- 95% HPD intervals also given
dat4a.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.93 1.50 4.68
2 ...2 5.92 4.02 8.11
# Or for even more control and ability to add other summaries
dat4a.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 2.93 1.50 4.68
2 5 2 .epred 5.92 4.02 8.11
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.928
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.92
- 95% HPD intervals also given
dat4a.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
exp() |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.93 1.50 4.68
2 ...2 5.92 4.02 8.11
# Or for even more control and ability to add other summaries
dat4a.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = exp(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 2.93 1.50 4.68
2 5 2 .linpred 5.92 4.02 8.11
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.928
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.92
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 1.07 0.54 1.61
5.0 1.78 1.44 2.09
Point estimate displayed: median
Results are given on the log (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat4b.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 1.07 0.533 1.60 0.95 median hdci
2 5 1.78 1.44 2.09 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 1.068
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1.78
- 95% HPD intervals also given
dat4b.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 3 0 7
2 ...2 6 0 11
# Or for even more control and ability to add other summaries
dat4b.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 3 0 7
2 5 2 .prediction 6 0 11
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 3
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
- 95% HPD intervals also given
dat4b.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.91 1.50 4.55
2 ...2 5.93 4.08 7.97
# Or for even more control and ability to add other summaries
dat4b.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 2.91 1.50 4.55
2 5 2 .epred 5.93 4.08 7.97
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.911
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.929
- 95% HPD intervals also given
dat4b.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
exp() |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 2.91 1.50 4.55
2 ...2 5.93 4.08 7.97
# Or for even more control and ability to add other summaries
dat4b.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = exp(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 2.91 1.50 4.55
2 5 2 .linpred 5.93 4.08 7.97
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.911
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.929
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 1.10 0.584 1.61
5.0 1.79 1.421 2.09
Point estimate displayed: median
Results are given on the log (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat4c.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 1.10 0.587 1.61 0.95 median hdci
2 5 1.79 1.43 2.11 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 1.1
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1.79
- 95% HPD intervals also given
dat4c.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 3 0 7
2 ...2 6 1 12
# Or for even more control and ability to add other summaries
dat4c.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 3 0 7
2 5 2 .prediction 6 1 12
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 3
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
- 95% HPD intervals also given
dat4c.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 3.00 1.68 4.83
2 ...2 5.99 4.14 8.11
# Or for even more control and ability to add other summaries
dat4c.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 3.00 1.68 4.83
2 5 2 .epred 5.99 4.14 8.11
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 3.004
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.989
- 95% HPD intervals also given
dat4c.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
exp() |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 3.00 1.68 4.83
2 ...2 5.99 4.14 8.11
# Or for even more control and ability to add other summaries
dat4c.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = exp(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 3.00 1.68 4.83
2 5 2 .linpred 5.99 4.14 8.11
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 3.004
- the fitted mean \(y\) associated with an \(x\) of 5 is 5.989
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x prob lower.HPD upper.HPD
2.5 0.0704 2.59e-05 0.387
5.0 0.4975 1.78e-01 0.783
Point estimate displayed: median
Results are back-transformed from the logit scale
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat5a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
mutate(.value = plogis(.value)) |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.0704 0.0000258 0.386 0.95 median hdci
2 5 0.498 0.179 0.785 0.95 median hdci
# OR with yet more control over the way posteriors are summarised
dat5a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 5
# Groups: x [2]
x variable median lower upper
<dbl> <chr> <dbl> <dbl> <dbl>
1 2.5 .value -2.58 -6.73 0.0680
2 5 .value -0.00980 -1.53 1.28
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.07
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0.498
- 95% HPD intervals also given
dat5a.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0 0 1
2 ...2 0 0 1
# Or for even more control and ability to add other summaries
dat5a.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 0 0 1
2 5 2 .prediction 0 0 1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0
- 95% HPD intervals also given
dat5a.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0704 0.0000258 0.387
2 ...2 0.498 0.178 0.783
# Or for even more control and ability to add other summaries
dat5a.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 0.0704 0.0000258 0.387
2 5 2 .epred 0.498 0.178 0.783
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.07
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.498
- 95% HPD intervals also given
dat5a.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
plogis() |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0704 0.0000258 0.387
2 ...2 0.498 0.178 0.783
# Or for even more control and ability to add other summaries
dat5a.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = plogis(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 0.0704 0.0000258 0.387
2 5 2 .linpred 0.498 0.178 0.783
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.07
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.498
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 -2.6167 -6.70 0.244
5.0 -0.0311 -1.45 1.501
Point estimate displayed: median
Results are given on the logit (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat5b.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 -2.62 -6.53 0.429 0.95 median hdci
2 5 -0.0311 -1.58 1.41 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is -2.617
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.031
- 95% HPD intervals also given
dat5b.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0 0 1
2 ...2 1 0 1
# Or for even more control and ability to add other summaries
dat5b.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 0 0 1
2 5 2 .prediction 0 0 1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0
- 95% HPD intervals also given
dat5b.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0681 0.00000402 0.396
2 ...2 0.492 0.189 0.818
# Or for even more control and ability to add other summaries
dat5b.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 0.0681 0.00000402 0.396
2 5 2 .epred 0.492 0.189 0.818
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.068
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.492
- 95% HPD intervals also given
dat5b.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
plogis() |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.0681 0.00000402 0.396
2 ...2 0.492 0.189 0.818
# Or for even more control and ability to add other summaries
dat5b.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = plogis(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 0.0681 0.00000402 0.396
2 5 2 .linpred 0.492 0.189 0.818
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.068
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.492
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 -1.4481 -4.37 0.737
5.0 0.0908 -1.29 1.378
Point estimate displayed: median
Results are given on the logit (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat5c.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 -1.45 -4.37 0.737 0.95 median hdci
2 5 0.0908 -1.29 1.38 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is -1.448
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0.091
- 95% HPD intervals also given
dat5c.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0 0 1
2 ...2 1 0 1
# Or for even more control and ability to add other summaries
dat5c.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .prediction 0 0 1
2 5 2 .prediction 1 0 1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1
- 95% HPD intervals also given
dat5c.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.190 0.000832 0.551
2 ...2 0.523 0.232 0.815
# Or for even more control and ability to add other summaries
dat5c.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .epred 0.190 0.000832 0.551
2 5 2 .epred 0.523 0.232 0.815
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.19
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.523
- 95% HPD intervals also given
dat5c.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
plogis() |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.190 0.000832 0.551
2 ...2 0.523 0.232 0.815
# Or for even more control and ability to add other summaries
dat5c.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
mutate(.linpred = plogis(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 6
# Groups: x, .row [2]
x .row variable median lower upper
<dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 .linpred 0.190 0.000832 0.551
2 5 2 .linpred 0.523 0.232 0.815
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.19
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.523
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x prob lower.HPD upper.HPD
2.5 0.167 0.0482 0.326
5.0 0.499 0.3747 0.613
Point estimate displayed: median
Results are back-transformed from the logit scale
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat6a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
mutate(.value = plogis(.value)) |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 0.167 0.0482 0.326 0.95 median hdci
2 5 0.499 0.375 0.613 0.95 median hdci
# OR with yet more control over the way posteriors are summarised
dat6a.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 5
# Groups: x [2]
x variable median lower upper
<dbl> <chr> <dbl> <dbl> <dbl>
1 2.5 .value -1.61 -2.76 -0.659
2 5 .value -0.00525 -0.512 0.459
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.167
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0.499
- 95% HPD intervals also given
dat6a.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0 0 1
2 ...2 1 0 1
# Or for even more control and ability to add other summaries
dat6a.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .prediction 0 0 1
2 5 1 2 .prediction 1 0 1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0
- 95% HPD intervals also given
dat6a.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.167 0.0482 0.326
2 ...2 0.499 0.375 0.613
# Or for even more control and ability to add other summaries
dat6a.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .epred 0.167 0.0482 0.326
2 5 1 2 .epred 0.499 0.375 0.613
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.167
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.499
- 95% HPD intervals also given
dat6a.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
plogis() |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.167 0.0482 0.326
2 ...2 0.499 0.375 0.613
# Or for even more control and ability to add other summaries
dat6a.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
mutate(.linpred = plogis(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .linpred 0.167 0.0482 0.326
2 5 1 2 .linpred 0.499 0.375 0.613
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.167
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.499
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 -1.6495 -2.746 -0.650
5.0 -0.0187 -0.521 0.481
Point estimate displayed: median
Results are given on the logit (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat6b.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 -1.65 -2.75 -0.650 0.95 median hdci
2 5 -0.0187 -0.555 0.452 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is -1.65
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.019
- 95% HPD intervals also given
dat6b.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0 0 1
2 ...2 0 0 1
# Or for even more control and ability to add other summaries
dat6b.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .prediction 0 0 1
2 5 1 2 .prediction 0 0 1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0
- 95% HPD intervals also given
dat6b.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.161 0.0480 0.315
2 ...2 0.495 0.373 0.618
# Or for even more control and ability to add other summaries
dat6b.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .epred 0.161 0.0480 0.315
2 5 1 2 .epred 0.495 0.373 0.618
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.161
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.495
- 95% HPD intervals also given
dat6b.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
plogis() |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.161 0.0480 0.315
2 ...2 0.495 0.373 0.618
# Or for even more control and ability to add other summaries
dat6b.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
mutate(.linpred = plogis(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .linpred 0.161 0.0480 0.315
2 5 1 2 .linpred 0.495 0.373 0.618
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.161
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.495
- 95% HPD intervals also given
In each case, we will predict \(y\) when \(x\) is 2.5 and 5
Note, for a Gaussian model emmeans
, posterior_epred
and posterior_linpred
will all yield the same outputs. posterior_predict
will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.
x emmean lower.HPD upper.HPD
2.5 -1.3104 -2.459 -0.322
5.0 0.0105 -0.485 0.502
Point estimate displayed: median
Results are given on the logit (not the response) scale.
HPD interval probability: 0.95
# OR with more control over the way posteriors are summarised
dat6c.brm2 |>
emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
median_hdci()
# A tibble: 2 × 7
x .value .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2.5 -1.31 -2.43 -0.279 0.95 median hdci
2 5 0.0105 -0.485 0.502 0.95 median hdci
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is -1.31
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0.011
- 95% HPD intervals also given
dat6c.brm2 |>
posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0 0 1
2 ...2 1 0 1
# Or for even more control and ability to add other summaries
dat6c.brm2 |>
add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .prediction 0 0 1
2 5 1 2 .prediction 1 0 1
Conclusions:
- the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
- the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1
- 95% HPD intervals also given
dat6c.brm2 |>
posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.212 0.0560 0.388
2 ...2 0.503 0.381 0.623
# Or for even more control and ability to add other summaries
dat6c.brm2 |>
add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .epred 0.212 0.0560 0.388
2 5 1 2 .epred 0.503 0.381 0.623
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.212
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.503
- 95% HPD intervals also given
dat6c.brm2 |>
posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
plogis() |>
summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 ...1 0.212 0.0560 0.388
2 ...2 0.503 0.381 0.623
# Or for even more control and ability to add other summaries
dat6c.brm2 |>
add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
mutate(.linpred = plogis(.linpred)) |>
dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
summarise_draws(
median,
HDInterval::hdi
)
# A tibble: 2 × 7
# Groups: x, total, .row [2]
x total .row variable median lower upper
<dbl> <dbl> <int> <chr> <dbl> <dbl> <dbl>
1 2.5 1 1 .linpred 0.212 0.0560 0.388
2 5 1 2 .linpred 0.503 0.381 0.623
Conclusions:
- the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.212
- the fitted mean \(y\) associated with an \(x\) of 5 is 0.503
- 95% HPD intervals also given
13 Further investigations
Since we have the entire posterior, we are able to make probability statements. We simply count up the number of MCMC sample draws that satisfy a condition (e.g represent a slope greater than 0) and then divide by the total number of MCMC samples.
Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:
- a change in \(x\) is associated with an increase in \(y\)
- a doubling of \(x\) (from 2.5 to 5) is associated with an increase in \(y\) of > 50%
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (x) > 0 -0.08 0.46 -0.84 0.66 0.74 0.42
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
- the parameter (
b_x
) minus 0 is -0.08 Evid.Ratio
: the ratio of evidence for the hypothesis vs the evidence against it. In this case, the evidence ratio is 0.735 -Inf
is because the divisor was 0 (no evidence against the hypothesis).Post.Prob
: the probability of the hypothesis is 0.424- there is very high evidence for this hypothesis
Alternatively, we could use gather_draws
to achieve a similar outcome.
In the following, in addition to median and HPD intervals, we will calculate the probability that the slope (b_x
) is greater than 0. To calculate such a probability, we could simply count up the number of posterior b_x
values that are greater than zero and then divide by the total number of posterior b_x
values. In R, we could do this as sum(b_x > 0)/length(b_x)
(where b_x > 0
will return either a 1 for each case it is true and a 0 when it is false, and thus summing is like counting). Dividing a sum by its length equates to a mean and thus we can achieve the probability by calcualing the mean of b_x > 0
.
# A tibble: 1 × 6
# Groups: .variable [1]
.variable variable median lower upper P
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 b_x .value -0.0803 -0.973 0.806 0.424
The summarise_draws()
function expects a set of one or more summary or diagnostic functions (such as median
etc). These can be supplied either as the name of the function (as in the case for median
in the example above) or if more arguments or information is required by the function, the function can be written out in full. in this case, the function must be proceeded with a ~
and the variable is denoted a .
(such as in P = ~mean(. > 0)
above).
Conclusions:
- the parameter (
b_x
) minus 0 is -0.08 P
: the probability of the hypothesis is 0.735- there is very high evidence for this hypothesis
dat1a.brm2 |>
emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
ungroup() |>
group_by(.draw) |>
summarise(ES = 100 * diff(.value) / .value[1]) |>
hypothesis("ES > 50")
Hypothesis Tests for class :
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (ES)-(50) > 0 22.54 2159.24 -178.62 257.92 2.94 0.75
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
- the difference between the percentage change in estimated \(y\) as \(x\) increases from 2.5 to 5 and 50% is 22.54
- the evidence ratio in support of the hypothesis that the percentage change exceeds 50% is 2.941
- the probability that the change in \(y\) exceeds 50% is 0.746
- the evidence for such a change is very weak
dat1a.brm2 |>
emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
ungroup() |>
group_by(.draw) |>
summarise(ES = 100 * diff(.value) / .value[1]) |>
summarise_draws(
mean, median,
HDInterval::hdi,
P = ~ mean(. > 50)
)
# A tibble: 1 × 6
variable mean median lower upper P
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ES 72.5 84.5 -352. 502. 0.746
Warning: `as_data_frame()` was deprecated in tibble 2.0.0.
ℹ Please use `as_tibble()` (with slightly different semantics) to convert to a
tibble, or `as.data.frame()` to convert to a data frame.
Conclusions:
- the percentage change in estimated \(y\) as \(x\) increases from 2.5 to 5 is 72.54
- the probability that the change in \(y\) exceeds 50% is 0.746
- the evidence for such a change is very weak
The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0).
The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is “practically equivalent” to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. Kruschke (2018) argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on Cohen (1988). Kruschke (2018) also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of “null-hypothesis” testing in a Bayesian framework.
- if the HDI of the focal parameter falls completely outside the ROPE, there is strong evidence that there is an effect
- if the HDI of the focal parameter falls completely inside the ROPE, there is strong evidence that there is not an effect
- otherwise there is not clear evidence either way
ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a “non-significant” result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart.
I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights.
## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat1a.brm2)
dat1a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI")
# Test for Practical Equivalence
ROPE: [-0.09 0.09]
Parameter | H0 | inside ROPE | 95% HDI
--------------------------------------------------
x | Undecided | 15.88 % | [-0.97 0.82]
dat1a.brm2 |>
bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") |>
plot()
Picking joint bandwidth of 0.0734
Conclusions:
- the percentage of the HPD for the slope that is inside the ROPE is 0.159
- there is strong evidence for an effect
OR using the rope
function.
## Calculate ROPE range manually
dat1a.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")
# Proportion of samples inside the ROPE [-0.09, 0.09]:
Parameter | inside ROPE
-----------------------
x | 15.87 %
The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect)
Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:
- a change in \(x\) is associated with an increase in \(y\)
- a doubling of \(x\) (from 2.5 to 5) is associated with an increase in \(y\) of > 50%
[1] "b_Intercept" "b_scalexscaleEQFALSE" "sigma"
[4] "Intercept" "prior_Intercept" "prior_b"
[7] "prior_sigma" "lprior" "lp__"
[10] "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (scalexscaleEQFALSE) > 0 -0.09 0.46 -0.81 0.65 0.68
Post.Prob Star
1 0.41
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
- the parameter (
b_scalexscaledEQFALSE
) minus 0 is -0.089 Evid.Ratio
: the ratio of evidence for the hypothesis vs the evidence against it. In this case, the evidence ratio is 0.683 -Inf
is because the divisor was 0 (no evidence against the hypothesis).Post.Prob
: the probability of the hypothesis is 0.406- there is very high evidence for this hypothesis
Alternatively, we could use gather_draws
to achieve a similar outcome.
In the following, in addition to median and HPD intervals, we will calculate the probability that the slope (b_x
) is greater than 0. To calculate such a probability, we could simply count up the number of posterior b_x
values that are greater than zero and then divide by the total number of posterior b_x
values. In R, we could do this as sum(b_x > 0)/length(b_x)
(where b_x > 0
will return either a 1 for each case it is true and a 0 when it is false, and thus summing is like counting). Dividing a sum by its length equates to a mean and thus we can achieve the probability by calcualing the mean of b_x > 0
.
dat1b.brm2 |>
gather_draws(`b_.*x.*`, regex = TRUE) |>
summarise_draws(median,
HDInterval::hdi,
P = ~mean(. > 0)
)
# A tibble: 1 × 6
# Groups: .variable [1]
.variable variable median lower upper P
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 b_scalexscaleEQFALSE .value -0.0841 -0.997 0.803 0.406
The summarise_draws()
function expects a set of one or more summary or diagnostic functions (such as median
etc). These can be supplied either as the name of the function (as in the case for median
in the example above) or if more arguments or information is required by the function, the function can be written out in full. in this case, the function must be proceeded with a ~
and the variable is denoted a .
(such as in P = ~mean(. > 0)
above).
Conclusions:
- the parameter (
b_scalexscaleEQFALSE
) minus 0 is -0.084 P
: the probability of the hypothesis is 0.683- there is very high evidence for this hypothesis
dat1b.brm2 |>
emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
ungroup() |>
group_by(.draw) |>
summarise(ES = 100 * diff(.value) / .value[1]) |>
hypothesis("ES > 50")
Hypothesis Tests for class :
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (ES)-(50) > 0 -34.88 1950.39 -205.63 207.74 2.85 0.74
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
- the difference between the percentage change in estimated \(y\) as \(x\) increases from 2.5 to 5 and 50% is -34.885
- the evidence ratio in support of the hypothesis that the percentage change exceeds 50% is 2.852
- the probability that the change in \(y\) exceeds 50% is 0.74
- the evidence for such a change is very weak
dat1b.brm2 |>
emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
ungroup() |>
group_by(.draw) |>
summarise(ES = 100 * diff(.value) / .value[1]) |>
summarise_draws(
mean, median,
HDInterval::hdi,
P = ~ mean(. > 50)
)
# A tibble: 1 × 6
variable mean median lower upper P
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ES 15.1 84.2 -430. 389. 0.740
Conclusions:
- the percentage change in estimated \(y\) as \(x\) increases from 2.5 to 5 is 15.115
- the probability that the change in \(y\) exceeds 50% is 0.74
- the evidence for such a change is very weak
The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0).
The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is “practically equivalent” to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. Kruschke (2018) argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on Cohen (1988). Kruschke (2018) also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of “null-hypothesis” testing in a Bayesian framework.
- if the HDI of the focal parameter falls completely outside the ROPE, there is strong evidence that there is an effect
- if the HDI of the focal parameter falls completely inside the ROPE, there is strong evidence that there is not an effect
- otherwise there is not clear evidence either way
ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a “non-significant” result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart.
I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights.
## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat1b.brm2)
dat1b.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI")
# Test for Practical Equivalence
ROPE: [-0.09 0.09]
Parameter | H0 | inside ROPE | 95% HDI
-----------------------------------------------------------
scalexscaleEQFALSE | Undecided | 18.33 % | [-0.97 0.85]
dat1b.brm2 |>
bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") |>
plot()
Picking joint bandwidth of 0.0727
Conclusions:
- the percentage of the HPD for the slope that is inside the ROPE is 0.183
- there is strong evidence for an effect
OR using the rope
function.
## Calculate ROPE range manually
dat1b.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")
# Proportion of samples inside the ROPE [-0.09, 0.09]:
Parameter | inside ROPE
--------------------------------
scalexscaleEQFALSE | 18.33 %
The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect)
Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:
- a change in \(x\) is associated with an increase in \(y\)
- a doubling of \(x\) (from 2.5 to 5) is associated with an increase in \(y\) of > 50%
[1] "b_Intercept" "b_scalex" "sigma" "Intercept"
[5] "prior_Intercept" "prior_b" "prior_sigma" "lprior"
[9] "lp__" "accept_stat__" "stepsize__" "treedepth__"
[13] "n_leapfrog__" "divergent__" "energy__"
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (scalex) > 0 -0.07 0.41 -0.71 0.58 0.72 0.42
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
- the parameter (
b_scalex
) minus 0 is -0.069 Evid.Ratio
: the ratio of evidence for the hypothesis vs the evidence against it. In this case, the evidence ratio is 0.72 -Inf
is because the divisor was 0 (no evidence against the hypothesis).Post.Prob
: the probability of the hypothesis is 0.419- there is very high evidence for this hypothesis
Alternatively, we could use gather_draws
to achieve a similar outcome.
In the following, in addition to median and HPD intervals, we will calculate the probability that the slope (b_x
) is greater than 0. To calculate such a probability, we could simply count up the number of posterior b_x
values that are greater than zero and then divide by the total number of posterior b_x
values. In R, we could do this as sum(b_x > 0)/length(b_x)
(where b_x > 0
will return either a 1 for each case it is true and a 0 when it is false, and thus summing is like counting). Dividing a sum by its length equates to a mean and thus we can achieve the probability by calcualing the mean of b_x > 0
.
dat1c.brm2 |>
gather_draws(`b_.*x`, regex = TRUE) |>
summarise_draws(median,
HDInterval::hdi,
P = ~mean(. > 0)
)
# A tibble: 1 × 6
# Groups: .variable [1]
.variable variable median lower upper P
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 b_scalex .value -0.0750 -0.813 0.752 0.419
The summarise_draws()
function expects a set of one or more summary or diagnostic functions (such as median
etc). These can be supplied either as the name of the function (as in the case for median
in the example above) or if more arguments or information is required by the function, the function can be written out in full. in this case, the function must be proceeded with a ~
and the variable is denoted a .
(such as in P = ~mean(. > 0)
above).
Conclusions:
- the parameter (
b_scalex
) minus 0 is -0.075 P
: the probability of the hypothesis is 0.72- there is very high evidence for this hypothesis
dat1c.brm2 |>
emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
ungroup() |>
group_by(.draw) |>
summarise(ES = 100 * diff(.value) / .value[1]) |>
hypothesis("ES > 50")
Hypothesis Tests for class :
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (ES)-(50) > 0 -214.24 10928.71 -188.13 241.73 3.02 0.75
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
- the difference between the percentage change in estimated \(y\) as \(x\) increases from 2.5 to 5 and 50% is -214.236
- the evidence ratio in support of the hypothesis that the percentage change exceeds 50% is 3.02
- the probability that the change in \(y\) exceeds 50% is 0.751
- the evidence for such a change is very weak
dat1c.brm2 |>
emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
gather_emmeans_draws() |>
ungroup() |>
group_by(.draw) |>
summarise(ES = 100 * diff(.value) / .value[1]) |>
summarise_draws(
mean, median,
HDInterval::hdi,
P = ~ mean(. > 50)
)
# A tibble: 1 × 6
variable mean median lower upper P
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ES -164. 84.8 -312. 498. 0.751
Conclusions:
- the percentage change in estimated \(y\) as \(x\) increases from 2.5 to 5 is -164.236
- the probability that the change in \(y\) exceeds 50% is 0.751
- the evidence for such a change is very weak
The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0).
The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is “practically equivalent” to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. Kruschke (2018) argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on Cohen (1988). Kruschke (2018) also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of “null-hypothesis” testing in a Bayesian framework.
- if the HDI of the focal parameter falls completely outside the ROPE, there is strong evidence that there is an effect
- if the HDI of the focal parameter falls completely inside the ROPE, there is strong evidence that there is not an effect
- otherwise there is not clear evidence either way
ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a “non-significant” result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart.
I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights.
## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat1c.brm2)
dat1c.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI")
# Test for Practical Equivalence
ROPE: [-0.09 0.09]
Parameter | H0 | inside ROPE | 95% HDI
--------------------------------------------------
scalex | Undecided | 19.43 % | [-0.87 0.71]
dat1c.brm2 |>
bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") |>
plot()
Picking joint bandwidth of 0.0637
Conclusions:
- the percentage of the HPD for the slope that is inside the ROPE is 0.194
- there is strong evidence for an effect
OR using the rope
function.
## Calculate ROPE range manually
dat1c.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")
# Proportion of samples inside the ROPE [-0.09, 0.09]:
Parameter | inside ROPE
-----------------------
scalex | 19.42 %
The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect)
Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:
- all pairwise comparisons (compare each level of \(x\) to each other
- define a specific set of contrasts that include comparing the average of medium and high treatments to the control treatment.
contrast estimate lower.HPD upper.HPD
control - medium 0.889 -4.89 6.44
control - high 8.726 2.38 14.76
medium - high 7.980 1.84 14.10
Point estimate displayed: median
HPD interval probability: 0.95
Or if we want the full posteriors… This option allows us to calculate exceedence probabilities. That is, we can calculate the proportion of contrast posteriors that exceed a specific value (as a hypothesis). In this case, we will calculate two exceedence probabilities:
- probability that the effect is negative (e.g. proportion of probabilities that are less than 0)
- probability that the effect is positive (e.g. proportion of probabilities that are greater than 0)
dat2a.brm2 |>
emmeans(~x) |>
pairs() |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(median,
HDInterval::hdi,
Pl = ~ mean(.x < 0),
Pg = ~ mean(.x > 0)
)
# A tibble: 3 × 7
# Groups: contrast [3]
contrast variable median lower upper Pl Pg
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 control - high .value 8.73 2.38 14.8 0.00417 0.996
2 control - medium .value 0.889 -4.89 6.44 0.373 0.627
3 medium - high .value 7.98 1.84 14.1 0.0075 0.992
Conclusions:
- the difference in \(y\) between “control” and “medium” is 0.89, however there is no evidence of this effect (exceedence probability)
- the difference in \(y\) between “control” and “high” is 8.73, and there is very strong evidence for this effect
- the difference in \(y\) between “medium” and “high” is 7.98, and there is very strong evidence for this effect
It is also possible to express the magnitude of effect in percentage change. The trick is to put the emmeans parameters onto a logarithmic scale so that the pairwise comparisons (which are a subtraction) effectively are treated as divisions (due to log laws).
contrast ratio lower.HPD upper.HPD
control/medium 1.04 0.755 1.34
control/high 1.72 1.078 2.55
medium/high 1.66 0.996 2.43
Point estimate displayed: median
HPD interval probability: 0.95
The estimates are expressed as fractional changes. A “ratio” of 1 indicates parity, since if you multiply something by 1, it does not change. A value of 1.5, indicates a 50% increase and a value of 0.5 indicates a 50% decline.
To calculate percentage change from a fractional value, subtract 1 and multiply the result by 100. E.g.
If we get the full posteriors, we can also explore whether the change exceeds some ecologically important change (such as 20%)
dat2a.brm2 |>
emmeans(~x) |>
regrid(transform = "log") |>
pairs() |>
regrid() |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(median,
HDInterval::hdi,
Pl = ~ mean(.x < 1),
Pg = ~ mean(.x > 1),
Pl50 = ~ mean(.x < 0.8),
Pg50 = ~ mean(.x > 1.2)
)
# A tibble: 3 × 9
# Groups: contrast [3]
contrast variable median lower upper Pl Pg Pl50 Pg50
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 control/high .value 1.72 1.08 2.55 0.00417 0.996 0.000417 0.962
2 control/medium .value 1.04 0.755 1.34 0.373 0.627 0.0383 0.145
3 medium/high .value 1.66 0.996 2.43 0.0075 0.992 0.000417 0.944
Conclusions:
- the response \(y\) is 72.25% higher in the control group over the high group.
- there is strong evidence (P = 1) of the above
- there is also strong evidence (P = 1) that \(y\) is at least 20% higher in the control group over the high group
- the response \(y\) is 4.378% higher in the control group over the medium group.
- there is no evidence (P = 0.627) of the above
- there is no evidence (P = 0.145) that \(y\) is at least 20% higher in the control group over the medium group
- the response \(y\) is 65.861% higher in the medium group over the high group.
- there is strong evidence (P = 0.993) of the above
- there is no evidence (P = 0.944) that \(y\) is at least 20% higher in the medium group over the high group
cmat <- cbind(
"Contrast vs Medium/High" = c(1, -0.5, -0.5),
"Medium vs High" = c(0, 1, -1)
)
dat2a.brm2 |>
emmeans(~x) |>
contrast(method = list(x = cmat))
contrast estimate lower.HPD upper.HPD
x.Contrast vs Medium/High 4.79 -0.215 9.82
x.Medium vs High 7.98 1.840 14.10
Point estimate displayed: median
HPD interval probability: 0.95
Or with full posteriors and exceedance probabilities…
dat2a.brm2 |>
emmeans(~x) |>
contrast(method = list(x = cmat)) |>
gather_emmeans_draws() |>
dplyr::select(-.chain) |>
summarise_draws(median,
HDInterval::hdi,
Pl = ~ mean(.x < 0),
Pg = ~ mean(.x > 0)
)
# A tibble: 2 × 7
# Groups: contrast [2]
contrast variable median lower upper Pl Pg
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 x.Contrast vs Medium/High .value 4.79 -0.215 9.82 0.035 0.965
2 x.Medium vs High .value 7.98 1.84 14.1 0.0075 0.992
Conclusions:
- on average, \(y\) is 4.79 higher in the “control” group than the “mediun” and “high” groups.
- the evidence for this effect is very strong 0.965
We have already seen that there is no evidence of a difference in \(y\) between the “control” and “medium” groups. This could be because either there is not enough power to detect the difference or that the populations are not different. It would be nice to be able to gain some insights into which of these is most likely. And we can. If we establish the range of values that represent an insubstantial effect, we can then quantify the proportion of the posterior that falls inside this Region of Practical Equivalence (ROPE).
Conventionally, the ROPE represents within 10% - that is, if the effect is less than 10% change, then we might consider it insubstantial.
## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat2, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat2a.brm2)
dat2a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI")
# Test for Practical Equivalence
ROPE: [-0.60 0.60]
Parameter | H0 | inside ROPE | 95% HDI
----------------------------------------------------
xmedium | Undecided | 17.50 % | [ -6.31 5.12]
xhigh | Rejected | 0.00 % | [-14.74 -2.34]
dat2a.brm2 |>
bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") |>
plot()
Picking joint bandwidth of 0.484
Conclusions:
- there is insufficient evidence to conclude that there is a difference in \(y\) between “control” and “medium” groups
- we cannot conclude that there is evidence of no effect
14 Summary plots
dat1a.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1a.brm2 |>
emmeans(~x, at = dat1a.grid) |>
as.data.frame() |>
ggplot(aes(y = emmean, x = x)) +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) +
geom_point(data = dat, aes(y = y)) +
geom_line() +
theme_classic()
As a spaghetti plot
dat1b.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1b.brm2 |>
emmeans(~x, at = dat1b.grid) |>
as.data.frame() |>
ggplot(aes(y = emmean, x = x)) +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) +
geom_point(data = dat, aes(y = y)) +
geom_line() +
theme_classic()
As a spaghetti plot
dat1c.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1c.brm2 |>
emmeans(~x, at = dat1c.grid) |>
as.data.frame() |>
ggplot(aes(y = emmean, x = x)) +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) +
geom_point(data = dat, aes(y = y)) +
geom_line() +
theme_classic()
As a spaghetti plot