Bayesian generalised linear models

Author

Murray Logan

Published

September 15, 2024

1 Preparations

Load the necessary libraries

library(tidyverse)   #for data wrangling and plotting
library(DHARMa)      #for simulated residuals
library(performance) #for model diagnostics
library(see)         #for model diagnostics
library(brms)        #for Bayesian models
library(tidybayes)   #for exploring Bayesian PKPDmodels
library(rstan)       #for diagnostics plots
library(bayesplot)   #for diagnostic plots
library(patchwork)   #for arranging multiple plots
library(gridGraphics)#for arranging multiple plots - needed for some patchwork plots
library(HDInterval)  #for HPD intervals
library(bayestestR)  #for ROPE
library(emmeans)     #for estimated marginal means
library(standist)    #for plotting distributions
library(cmdstanr)    #for the backend
source("helperfunctions.R")

Many biologists and ecologists get a little twitchy and nervous around mathematical and statistical formulae and nomenclature. Whilst it is possible to perform basic statistics without too much regard for the actual equation (model) being employed, as the complexity of the analysis increases, the need to understand the underlying model becomes increasingly important. Moreover, model specification in BUGS (the language used to program Bayesian modelling) aligns very closely to the underlying formulae. Hence a good understanding of the underlying model is vital to be able to create a sensible Bayesian model. Consequently, I will always present the linear model formulae along with the analysis. If you start to feel some form of disorder starting to develop, you might like to run through the Tutorials and Workshops twice (the first time ignoring the formulae).

Note

This tutorial will introduce the concept of Bayesian (generalised) linear models and demonstrate how to fit simple models to a set of simple fabricated data sets, each representing major data types encountered in ecological research. Subsequent tutorials will build on these fundamentals with increasingly more complex data and models.

2 A philosophical note

To introduce the philosophical and mathematical differences between classical (frequentist) and Bayesian statistics, Wade (2000) presented a provocative yet compelling trend analysis of two hypothetical populations. The temporal trend of one of the populations shows very little variability from a very subtle linear decline. By contrast, the second population appears to decline more dramatically, yet has substantially more variability.

Wade (2000) neatly illustrates the contrasting conclusions (particularly with respect to interpreting probability) that would be drawn by the frequentist and Bayesian approaches and in so doing highlights how and why the Bayesian approach provides outcomes that are more aligned with management requirements.

This tutorial will start by replicating the demonstration of Wade (2000).

n: 10
Slope: -0.1022
t: -2.3252
p-value: 0.0485

n: 10
Slope: -10.2318
t: -2.2115
p-value: 0.0579

n: 100
Slope: -10.4713
t: -6.6457
p-value: 0

From a traditional frequentist perspective, we would conclude that there is a ‘significant’ relationship in Population A and C (\(p<0.05\)), yet not in Population B (\(p>0.05\)). Note, Population B and C were both generated from the same random distribution, it is just that Population C has a substantially higher number of observations.

The above illustrates a couple of things

  • statistical significance does not necessarily translate into biological importance. The percentage of decline for Population A is 0.46 where as the percentage of decline for Population B is 45.26. That is Population B is declining at nearly 10 times the rate of Population A. That sounds rather important, yet on the basis of the hypothesis test, we would dismiss the decline in Population B.

  • that a p-value is just the probability of detecting an effect or relationship - what is the probability that the sample size is large enough to pick up a difference.

Let us now look at it from a Bayesian perspective. I will just provide the posterior distributions (densities scaled to 0-1 so that they can be plotted together) for the slope for each population.

Focusing on Populations A and B, we would conclude:

  • the mean (plus or minus CI) slopes for Population A and B are -0.1 (-0.21,0) and -10.14 (-20.12,0.37) respectively.

  • the Bayesian approach allows us to query the posterior distribution is many other ways in order to ask sensible biological questions. For example, we might consider that a rate of change of 5% or greater represents an important biological impact. For Population A and B, the probability that the rate is 5% or greater is 0 and 0.86 respectively.

3 Review of (generalised) linear models

I would highly recommend reviewing the information in the tutorial on generalised linear models, particularly the sections describe linear models, assumption checking and generalised linear models (GLM). Whilst there are philosophical differences between frequentist and Bayesian statistics that have implications for how models are fit and interpreted, model choice and assumption checking principles are common between the two approaches. Hence, many of these topics will be assumed, and not fully described in the current tutorial.

Recall from the tutorial on generalised linear models that simple linear regression is a linear modelling process that models a single response variable against one or more predictors with a linear combination of coefficients and (in the case of a Gaussian model) can be expressed as:

\[y_i = \beta_0+ \beta_1 x_i+\epsilon_i \hspace{1cm}\epsilon\sim{}N(0,\sigma^2)\]

where:

  • \(y_i\) is the response value for each of the \(i\) observations

  • \(\beta_0\) is the y-intercept (value of \(y\) when \(x=0\))

  • \(\beta_1\) is the slope (rate of chance in \(y\) per unit chance in \(x\))

  • \(x_i\) is the predictor value for each of the \(i\) observations

  • \(\epsilon_i\) is the residual value of each of the \(i\) observations. A residual is the difference between the observed value and the value expected by the model.

  • \(\epsilon\sim{}N(0,\sigma^2)\) indicates that the residuals are normally distributed with a constant amount of variance

The above can be re-expressed and generalised as:

\[ \begin{align} y_i&\sim{}Dist(\mu_i, ...) \\ g(\mu_i) &= \beta_0+ \beta_1 x_i \end{align} \]

where:

  • \(Dist\) represents a distribution from the exponential family (such as Gaussian, Poisson, Binomial, etc)
  • \(...\) represents additional parameters relevant to the nominated distribution (such as \(\sigma^2\): Gaussian, \(n\): Binomial and \(\phi\): Negative Binomial, etc)
  • \(g()\) represents the link function (e.g. log: Poisson, logit: Binomial, etc)

The reliability of any model depends on the degree to which the data adheres to the model assumptions. Hence, as with frequentist models, exploratory data analysis (EDA) is a vital component of Bayesian modelling and since the model structures are similar between frequentist and Bayesian approaches, so too is EDA.

4 Bayesian (generalised) linear models

For the purpose of introduction, we will start by exploring a Gaussian model with a very simple fabricated data set representing the relationship between a response (\(y\)) and a continuous predictor (\(x = [1,2,3,4,5,6,7,8,9,10]\). The fabricated data set will comprise 10 observations each drawn from normal distributions with a set standard deviation of 4. The means of the 10 populations will be determined by the following equation:

\[ \mu_i = 2 + 5\times x_i \]

Let generate these data.

set.seed(234)
dat <- data.frame(x = 1:10) |>
    mutate(y = round(rnorm(n = 10, mean = 2 + (5 * x), sd = 4), digits = 2))
dat
    x     y
1   1  9.64
2   2  3.79
3   3 11.00
4   4 27.88
5   5 32.84
6   6 32.56
7   7 37.84
8   8 29.86
9   9 45.05
10 10 47.65

The model will will be fitting will be:

\[ \begin{align} y_i&\sim{}N(\mu_i, \sigma^2)\\ \mu_i &= \beta_0+ \beta_1 x_i \end{align} \]

The parameters that we are going to attempt to estimate are the y-intercept (\(\beta_0\)), the slope (\(\beta_1\)) and the underlying variance (\(\sigma^2\)). Recall (from tutorials on statistical philosophies and estimation)) that Bayesian models calculate posterior predictions (\(P(H|D)\)) from likelihood (\(P(D|H)\)) and prior expectations (\(P(H)\)). Therefore, in preparation for fitting a Bayesian model, we must consider what our prior expectations are for all parameters.

The individual responses (\(y_i\), observed yields) are each expected to have been independently drawn from normal (Gaussian) distributions (\(\mathcal{N}\)). These distributions represent all the possible values of \(y\) we could have obtained at the specific (\(i^{th}\)) level of \(x\). Hence the \(i^{th}\) \(y\) observation is expected to have been drawn from a normal distribution with a mean of \(\mu_i\).

Although each distribution is expected to come from populations that differ in their means, we assume that all of these distributions have the same variance (\(\sigma^2\)).

4.1 Priors

We need to supply priors for each of the parameters to be estimated (\(\beta_0\), \(\beta_1\) and \(\sigma\)). Whilst we want these priors to be sufficiently vague as to not influence the outcomes of the analysis (and thus be equivalent to the frequentist analysis), we do not want the priors to be so vague (wide) that they permit the MCMC sampler to drift off into parameter space that is both illogical as well as numerically awkward.

Proffering sensible priors is one of the most difficult aspects of performing Bayesian analyses. For instances where there are some previous knowledge available and a desire to incorporate those data, the difficulty is in how to ensure that the information is incorporated correctly. However, for instances where there are no previous relevant information and so a desire to have the posteriors driven entirely by the new data, the difficulty is in how to define priors that are both vague enough (not bias results in their direction) and yet not so vague as to allow the MCMC sampler to drift off into unsupported regions (and thus get stuck and yield spurious estimates).

For early implementations of MCMC sampling routines (such as Metropolis Hasting and Gibbs), it was fairly common to see very vague priors being defined. For example, the priors on effects, were typically normal priors with mean of 0 and variance of 1e+06 (1,00,000). These are very vague priors. Yet for some samplers (e.g. NUTS), such vague priors can encourage poor behaviour of the sampler - particularly if the posterior is complex. It is now generally advised that priors should (where possible) be somewhat weakly informative and to some extent, represent the bounds of what are feasible and sensible estimates.

The degree to which priors influence an outcome (whether by having a pulling effect on the estimates or by encouraging the sampler to drift off into unsupported regions of the posterior) is dependent on:

  • the relative sparsity of the data - the larger the data, the less weight the priors have and thus less influence they exert.
  • the complexity of the model (and thus posterior) - the more parameters, the more sensitive the sampler is to the priors.

The sampled posterior is the product of both the likelihood and the prior - all of which are multidimensional. For most applications, it would be vertically impossible to define a sensible multidimensional prior. Hence, our only option is to define priors on individual parameters (e.g. the intercept, slope(s), variance etc) and to hope that if they are individually sensible, they will remain collectively sensible.

So having (hopefully) impressed upon the notion that priors are an important consideration, I will now attempt to synthesise some of the approaches that can be employed to arrive at weakly informative priors that have been gleaned from various sources. Largely, this advice has come from the following resources:

  • https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations
  • http://svmiller.com/blog/2021/02/thinking-about-your-priors-bayesian-analysis/

I will outline some of the current main recommendations before summarising some approaches in a table.

  • weakly informative priors should contain enough information so as to regularise (discourage unreasonable parameter estimates whilst allowing all reasonable estimates).
  • for effects parameters on scaled (standardised) data, an argument could be made for a normal distribution with a standard deviation of 1 (e.g. normal(0,1)), although some prefer a t distribution with 3 degrees of freedom and standard deviation of 1 (e.g. student_t(3,0,1)) - apparently a flatter t is a more robust prior than a normal as an uninformative prior…
  • for un-scaled data, the above priors can be scaled by using the standard deviation of the data as the prior standard deviation (e.g. student_t(3,0,sd(y)), or sudent_t(3,0,sd(y)/sd(x)))
  • for priors of hierachical standard deviations, priors should encourage shrinkage towards 0 (particularly if the number of groups is small, since otherwise, the sampler will tend to be more responsive to “noise”).

In this tutorial series, we will perform Bayesian analysis in the STAN language via an R interface. Two popular interfaces that greatly simplify the specification of Bayesian models are brms and rstanarm. We will exclusively focus on the former as it is far more flexible.

Family Parameter brms rstanarm
Gaussian Intercept student_t(3,median(y),mad(y)) normal(mean(y),2.5*sd(y))
‘Population effects’ (slopes, betas) flat, improper priors normal(0,2.5*sd(y)/sd(x))
Sigma student_t(3,0,mad(y)) exponential(1/sd(y))
‘Group-level effects’ student_t(3,0,mad(y)) decov(1,1,1,1)
Correlation on group-level effects ljk_corr_cholesky(1)
Poisson Intercept student_t(3,median(y),mad(y)) normal(mean(y),2.5*sd(y))
‘Population effects’ (slopes, betas) flat, improper priors normal(0,2.5*sd(y)/sd(x))
‘Group-level effects’ student_t(3,0,mad(y)) decov(1,1,1,1)
Correlation on group-level effects ljk_corr_cholesky(1)
Negative binomial Intercept student_t(3,median(y),mad(y)) normal(mean(y),2.5*sd(y))
‘Population effects’ (slopes, betas) flat, improper priors normal(0,2.5*sd(y)/sd(x))
Shape gamma(0.01, 0.01) exponential(1/sd(y))
‘Group-level effects’ student_t(3,0,mad(y)) decov(1,1,1,1)
Correlation on group-level effects ljk_corr_cholesky(1)

Notes:

brms

https://github.com/paul-buerkner/brms/blob/c2b24475d727c8afd8bfc95947c18793b8ce2892/R/priors.R

  1. In the above, for non-Gaussian families, y is first transformed according to the family link. If the family link is log, then 0.1 is first added to 0 values.
  2. in brms the minimum standard deviation for the Intercept prior is 2.5
  3. in brms the minimum standard deviation for group-level priors is 10.

rstanarm

http://mc-stan.org/rstanarm/articles/priors.html

  1. in rstanarm priors on standard deviation and correlation associated with group-level effects are packaged up into a single prior (decov which is a decomposition of the variance and covariance matrix).

In my experience, I find that the above priors tend to be a little bit too wide for many ecological applications and I often prefer to use 1.5 rather than 2.5 as the multiplier.

Note

In Bayesian models, centering of predictors offers huge numerical advantages. So important is it to center that brms automatically centers any continuous predictors for you. However, since the user has not necessarily centered the predictors, the user might misinterpret the outputs from a brms model. Consequently, when fitting a model, brms also generates y-intercept values that are consistent with un-centered values and these are the estimates returned to the user.

Nevertheless, I would recommend that you should always explicitly center continuous predictors to provide more meaningful interpretations of the y-intercept. I would also highly recommend standardising continuous predictors - this will not only help speed up and stabilise the model, it will simplify the spefication of priors - see the specific examples later in this tutorial.

Based on the above, for our fabricated data, lets assign the following priors:

  • \(\beta_0\): Normal prior centred at 31.21 with a variance of 15.17
    • mean:

      dat$y |> median() |> round(2)
      [1] 31.21
    • variance:

      dat$y |> mad() |> round(2)
      [1] 15.17
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 4.09
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      mad(dat$y) / mad(dat$x) |> round(2)
      [1] 4.090138
  • \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 15.17
    • variance:

      dat$y |> mad() |> round(2)
      [1] 15.17

Note, again, when fitting models through either rstanarm or brms, the priors assume that the predictor(s) have been centred and are to be applied on the link scale. In this case the link scale is an identity.

Similar logic can be applied for models that employ different distributions. In the following sections, we will define numerous sets of data (each of which represents different major forms of ecological data) and see how we can set appropriate priors in each case. In working through these example, it is worth reflecting on how much simpler prior specification is if we use standardised predictors.

5 Example data

This tutorial will blend theoretical discussions with actual calculations and model fits. I believe that by bridging the divide between theory and application, we all gain better understanding. The applied components of this tutorial will be motivated by numerous fabricated data sets. The advantage of simulated data over real data is that with simulated data, we know the ‘truth’ and can therefore gauge the accuracy of estimates.

The motivating examples are:

  • Example 1 - simulated samples drawn from a Gaussian (normal) distribution reminiscent of data collected on measurements (such as body mass)
  • Example 2 - simulated Gaussian samples drawn three different populations representing three different treatment levels (e.g. body masses of three different species)
  • Example 3 - simulated samples drawn from a Poisson distribution reminiscent of count data (such as number of individuals of a species within quadrats)
  • Example 4 - simulated samples drawn from a Negative Binomial distribution reminiscent of over-dispersed count data (such as number of individuals of a species that tends to aggregate in groups)
  • Example 5 - simulated samples drawn from a Bernoulli (binomial with \(n = 1\)) distribution reminiscent of binary data (such as the presence/absence of a species within sites)
  • Example 6 - simulated samples drawn from a Binomial distribution reminiscent of proportional data (such as counts of a particular taxa out of a total number of individuals)

Lets formally simulate the data illustrated above. The underlying process dictates that on average a one unit change in the predictor (x) will be associated with a five unit change in response (y) and when the predictor has a value of 0, the response will typically be 2. Hence, the response (y) will be related to the predictor (x) via the following:

\[ y = 2 + 5x \]

This is a deterministic model, it has no uncertainty. In order to simulate actual data, we need to add some random noise. We will assume that the residuals are drawn from a Gaussian distribution with a mean of zero and standard deviation of 4. The predictor will comprise of 10 uniformly distributed integer values between 1 and 10. We will round the response to two decimal places.

For repeatability, a seed will be employed on the random number generator. Note, the smaller the dataset, the less it is likely to represent the underlying deterministic equation, so we should keep this in mind when we look at how closely our estimated parameters approximate the ‘true’ values. Hence, the seed has been chosen to yield data that maintain a general trend that is consistent with the defining parameters.

set.seed(234)
dat <- data.frame(x = 1:10) |>
    mutate(y = round(2 + 5*x + rnorm(n = 10, mean = 0, sd = 4), digits = 2))
dat
    x     y
1   1  9.64
2   2  3.79
3   3 11.00
4   4 27.88
5   5 32.84
6   6 32.56
7   7 37.84
8   8 29.86
9   9 45.05
10 10 47.65
ggplot(data = dat) + 
geom_point(aes(y = y, x = x))

We will use these data in two ways. Firstly, to estimate the mean and variance of the reponse (y) ignoring the predcitor (x) and secondly to estimate the relationship between the reponse and predictor.

For the former, we know that the mean and variance of the response (y) can be calculated as:

\[ \begin{align} \bar{y} =& \frac{1}{n}\sum^n_{i=1}y_i\\ var(y) =& \frac{1}{n}\sum^n_{i=1}(y-\bar{y})^2\\ sd(y) =& \sqrt{var(y)} \end{align} \]

mean(dat$y)
[1] 27.811
var(dat$y)
[1] 225.9111
sd(dat$y)
[1] 15.03034

As previously described, categorical predictors are transformed into dummy codes prior to the fitting of the linear model. We will simulate a small data set with a single categorical predictor comprising a control and two treatment levels (‘mediam’, ‘high’). To simplify things we will assume a Gaussian distribution, however most of the modelling steps would be the same regardless of the chosen distribution.

The data will be drawn from three Gaussian distributions with a standard deviation of 4 and means of 20, 15 and 10. We will draw a total of 12 observations, four from each of the three populations.

set.seed(123)
beta_0 <- 20
beta <- c(-2, -10)
sigma <- 4
n <- 12
x <- gl(3, 4, 12, labels = c('control', 'medium', 'high'))
y <- (model.matrix(~x) %*% c(beta_0, beta)) + rnorm(12, 0, sigma)
dat2 <- data.frame(x = x, y = y)
dat2
         x         y
1  control 17.758097
2  control 19.079290
3  control 26.234833
4  control 20.282034
5   medium 18.517151
6   medium 24.860260
7   medium 19.843665
8   medium 12.939755
9     high  7.252589
10    high  8.217352
11    high 14.896327
12    high 11.439255
ggplot(data = dat2) + 
geom_point(aes(y = y, x = x)) 

The Poisson distribution is only parameterized by a single parameter (\(\lambda\)) which represents both the mean and variance. Furthermore, Poisson data can only be positive integers.

Unlike simple trend between two Gaussian or uniform distributions, modelling against a Poisson distribution alters the scale to logarithms. This needs to be taken into account when we simulate the data. The parameters that we used to simulate the underlying processes need to either be on a logarithmic scale, or else converted to a logarithmic scale prior to using them for generating the random data.

Moreover, for any model that involves a non-identity link function (such as a logarithmic link function for Poisson models), ‘slope’ is only constant on the scale of the link function. When it is back transformed onto the natural scale (scale of the data), it takes on a different meaning and interpretation.

We will chose \(\beta_0\) to represent a value of 1 when x=0. As for the ‘effect’ of the predictor on the response, lets say that for every one unit increase in the predictor the response increases by 40% (on the natural scale). Hence, on the log scale, the slope will be \(log(1.5)=\) 0.3364722.

set.seed(123)
beta <- c(1, 1.40)
beta <- log(beta)
n <- 10
dat3 <- data.frame(x=seq(from = 1, to = 10, len = n)) |>
    mutate(y = rpois(n, lambda = exp(beta[1] + beta[2]*x)))
dat3
    x  y
1   1  1
2   2  3
3   3  2
4   4  6
5   5  9
6   6  3
7   7 10
8   8 15
9   9 28
10 10 31
ggplot(data = dat3) + 
geom_point(aes(y = y, x = x)) 

In theory, count data should follow a Poisson distribution and therefore have properties like mean equal to variance (e.g. \(\textnormal{Dispersion}=\frac{\sigma}{\mu}=1\)). However as simple linear models are low dimensional representations of a system, it is often unlikely that such a simple model can capture all the variability in the response (counts). For example, if we were modelling the abundance of a species of intertidal snail within quadrats in relation to water depth, it is highly likely that water depth alone drives snail abundance. There are countless other influences that the model has not accounted for. As a result, the observed data might be more variable than a Poisson (of a particular mean) would expect and in such cases, the model is over-dispersed (more variance than expected).

Over dispersed models under-estimate the variability and thus precision in estimates resulting in inflated confidence in outcomes (elevated Type I errors).

There are numerous causes of over-dispersed count data (one of which is eluded to above). These are:

  • additional sources of variability not being accounted for in the model (see above)
  • when the items being counted aggregate together. Although the underlying items may have been generated by a Poisson process, the items clump together. When the items are counted, they are more likely to be in either in relatively low or relatively high numbers - hence the data are more varied than would be expected from their overall mean.
  • imperfect detection resulting in excessive zeros. Again the underlying items may have been generated by a Poisson process, however detecting and counting the items might not be completely straight forward (particularly for more cryptic items). Hence, the researcher may have recorded no individuals in a quadrat and yet there was one or more present, they were just not obvious and were not detected. That is, layered over the Poisson process is another process that determines the detectability. So while the Poisson might expect a certain proportion of zeros, the observed data might have a substantially higher proportion of zeros - and thus higher variance.

This example will generate data that is drawn from a negative binomial distribution so as to broadly represent any one of the above causes.

We will chose \(\beta_0\) to represent a value of 1 when x=0. As for the ‘effect’ of the predictor on the response, lets say that for every one unit increase in the predictor the response increases by 40% (on the natural scale). Hence, on the log scale, the slope will be \(log(1.5)=\) 0.3364722. Finally, the dispersion parameter (ratio of variance to mean) will be 10.

set.seed(234)
beta <- c(1, 1.40)
beta <- log(beta)
n <- 10
size <- 10
dat4 <- data.frame(x = seq(from = 1, to = 10, len = n)) |>
    mutate(
        mu = exp(beta[1] + beta[2] * x),
        y = rnbinom(n, size = size, mu =  mu)
    )
dat4
    x        mu  y
1   1  1.400000  0
2   2  1.960000  3
3   3  2.744000  7
4   4  3.841600  3
5   5  5.378240  5
6   6  7.529536  9
7   7 10.541350 13
8   8 14.757891 10
9   9 20.661047 17
10 10 28.925465 26
ggplot(data = dat4) + 
geom_point(aes(y = y, x = x)) 

Binary data (presence/absence, dead/alive, yes/no, heads/tails, etc) pose unique challenges for linear modeling. Linear regression, designed for continuous outcomes, may not be directly applicable to binary responses. The nature of binary data violates assumptions of normality and homoscedasticity, which are fundamental to linear regression. Furthermore, linear models may predict probabilities outside the [0, 1] range, leading to unrealistic predictions.

This example will generate data that is drawn from a bernoulli distribution so as to broadly represent presence/absence data.

We will chose \(\beta_0\) to represent the odds of a value of 1 when \(x=0\) equal to \(0.02\). This is equivalent to a probability of \(y\) being zero when \(x=0\) of \(\frac{0.02}{1+0.02}=0.0196\). E.g., at low \(x\), the response is likely to be close to 0. For every one unit increase in \(x\), we will stipulate a 2 times increase in odds that the expected response is equal to 1.

set.seed(234)
beta <- c(0.02, 2)
beta <- log(beta)
n <- 10
dat5 <- data.frame(x = seq(from = 1, to = 10, len = n)) |>
    mutate(
        y = as.numeric(rbernoulli(n, p = plogis(beta[1] + beta[2] * x)))
    )
dat5
    x y
1   1 0
2   2 0
3   3 0
4   4 1
5   5 0
6   6 1
7   7 1
8   8 1
9   9 1
10 10 1
ggplot(data = dat5) + 
geom_point(aes(y = y, x = x)) 

Similar to binary data, proportional (binomial) data tend to violate normality and homogeneity of variance (particularly as mean proportions approach either 0% or 100%.

This example will generate data that is drawn from a binomial distribution so as to broadly represent proportion data.

We will chose \(\beta_0\) to represent the odds of a particular trial (e.g. an individual) being of a particular type (e.g. species 1) when \(x=0\) and to equal to \(0.02\). This is equivalent to a probability of \(y\) being of the focal type when \(x=0\) of \(\frac{0.02}{1+0.02}=0.0196\). E.g., at low \(x\), the the probability that an individual is taxa 1 is likely to be close to 0. For every one unit increase in \(x\), we will stipulate a 2.5 times increase in odds that the expected response is equal to 1.

For this example, we will also convert the counts into proportions (\(y2\)) by division with the number of trials (\(5\)).

set.seed(123)
beta <- c(0.02, 2.5)
beta <- log(beta)
n <- 10
trials <- 5
dat6 <- data.frame(x = seq(from = 1, to = 10, len = n)) |>
    mutate(
      count = as.numeric(rbinom(n, size = trials, prob = plogis(beta[1] + beta[2] * x))),
      total = trials,
      y = count/total
    )
dat6
    x count total   y
1   1     0     5 0.0
2   2     1     5 0.2
3   3     1     5 0.2
4   4     4     5 0.8
5   5     2     5 0.4
6   6     5     5 1.0
7   7     5     5 1.0
8   8     4     5 0.8
9   9     5     5 1.0
10 10     5     5 1.0
ggplot(data = dat6) + 
geom_point(aes(y = y, x = x)) 

6 Exploratory data analysis

Statistical models utilize data and the inherent statistical properties of distributions to discern patterns, relationships, and trends, enabling the extraction of meaningful insights, predictions, or inferences about the phenomena under investigation. To do so, statistical models make assumptions about the likely distributions from which the data were collected. Consequently, the reliability and validity of any statistical model depend upon adherence to these underlying assumptions.

Exploratory Data Analysis (EDA) and assumption checking therefore play pivotal roles in the process of statistical analysis, offering essential tools to glean insights, assess the reliability of statistical methods, and ensure the validity of conclusions drawn from data. EDA involves visually and statistically examining datasets to understand their underlying patterns, distributions, and potential outliers. These initial steps provides an intuitive understanding of the data’s structure and guides subsequent analyses. By scrutinizing assumptions, such as normality, homoscedasticity, and independence, researchers can identify potential limitations or violations that may impact the accuracy and reliability of their findings.

Exploratory Data Analysis within the context of ecological statistical models usually comprise a set of targetted graphical summaries. They are not to be considered definitive diagnostics of the model assumptions, but rather a first pass to assess the obvious violations prior to the fitting of models. More definitive diagnostics can only be achieved after a model has been fit.

In addition to graphical summaries, there are numerous statistical tests to help explore possible violations of various statistical assumptions. These tests are less commonly used in ecology since they are often more sensitive to deviations from ideal than are the models that we are seeking to ensure.

Simple classic regression models are often the easiest models to fit and interpret and as such often represent a standard by which other alternate models are gauged. As you will see later in this tutorial, such models can actually be fit using closed form (exact solution) matrix algebra that can be performed by hand. Nevertheless, and perhaps as a result, they also impose some of the strictest assumptions. Although these collective assumptions are specific to gaussian models, they do provide a good introduction to model assumptions in general, so we will use them to motivate the more wider discussion.

Simple (gaussian) linear models (represented below) make the following assumptions:

The data depicted above where generated using the following R codes:

set.seed(234)
x <- 1:10
y <- 2 + (5*x) + rnorm(10,0,4)

The observations represent

  • single observations drawn from 10 normal populations
  • each population had a standard deviation of 4
  • the mean of each population varied linearly according to the value of x (\(2 + 5x\))
  • normality: the residuals (and thus observations) must be drawn from populations that are normal distribution. The right hand figure underlays the ficticious normally distributed populations from which the observed values have been sampled.

Estimation and inference testing in linear regression assumes that the response is normally distributed in each of the populations. In this case, the populations are all possible measurements that could be collected at each level of \(x\) - hence there are 16 populations. Typically however, we only collect a single observation from each population (as is also the case here). How then can be evaluate whether each of these populations are likely to have been normal?

For a given response, the population distributions should follow much the same distribution shapes. Therefore provided the single samples from each population are unbiased representations of those populations, a boxplot of all observations should reflect the population distributions.

The two figures above show the relationships between the individual population distributions and the overall distribution. The left hand figure shows a distribution drawn from single representatives of each of the 16 populations. Since the 16 individual populations were normally distributed, the distribution of the 16 observations is also normal.

By contrast, the right hand figure shows 16 log-normally distributed populations and the resulting distribution of 16 single observations drawn from these populations. The overall boxplot mirrors each of the individual population distributions.

Whilst traditionally, non-normal data would typically be normalised via a scale transformation (such as a logarithmic transformation), these days it is arguably more appropriate to attempt to match the data to a more suitable distribution (see later in this tutorial).

You may have noticed that we have only explored the distribution of the response (y-axis). What about the distribution of the predictor (independent, x-axis) variable, does it matter? The distribution assumption applies to the residuals (which as purely in the direction of the y-axis). Indeed technically, it is assumed that there is no uncertainty associated with the predictor variable. They are assumed to be set and thus there is no error associated with the values observed. Whilst this might not always be reasonable, it is an assumption.

Given that the predictor values are expected to be set rather than measured, we actually assume that they are uniformly distributed. In practice, the exact distribution of predictor values is not that important provided it is reasonably symmetrical and no outliers (unusually small or large values) are created as a result of the distribution.

As with exploring the distribution of the response variable, boxplots, histograms and density plots can be useful means of exploring the distribution of predictor variable(s). When such diagnostics reveal distributional issues, scale transformations (such as logarithmic transformations) are appropriate.

  • homogeneity of variance: the residuals (and thus observations) must be drawn from populations that are equally varied. The model as shown only estimates a single variance (\(\sigma^2\)) parameter - it is assumed that this is a good overall representation of all underlying populations. The right hand figure underlays the ficticious normally distributed and equally varied populations from which the observations have been sampled.

    Moreover, since the expected values (obtained by solving the deterministic component of the model) and the variance must be estimated from the same data, they need to be independent (not related one another)

Simple linear regression also assumes that each of the populations are equally varied. Actually, it is the prospect of a relationship between the mean and variance of y-values across x-values that is of the greatest concern. Strictly the assumption is that the distribution of y values at each x value are equally varied and that there is no relationship between mean and variance.

However, as we only have a single y-value for each x-value, it is difficult to directly determine whether the assumption of homogeneity of variance is likely to have been violated (mean of one value is meaningless and variability can’t be assessed from a single value). The figure below depicts the ideal (and almost never realistic) situation in which (left hand figure) the populations are all equally varied. The middle figure simulates drawing a single observation from each of the populations. When the populations are equally varied, the spread of observed values around the trend line is fairly even - that is, there is no trend in the spread of values along the line.

If we then plot the residuals (difference between observed values and those predicted by the trendline) against the predict values, there is a definite lack of pattern. This lack of pattern is indicative of a lack of issues with homogeneity of variance.

If we now contrast the above to a situation where the population variance is related to the mean (unequal variance), we see that the observations drawn from these populations are not evenly distributed along the trendline (they get more spread out as the mean predicted value increase). This pattern is emphasized in the residual plot which displays a characteristic “wedge”-shape pattern.

Hence looking at the spread of values around a trendline on a scatterplot of \(y\) against \(x\) is a useful way of identifying gross violations of homogeneity of variance. Residual plots provide an even better diagnostic. The presence of a wedge shape is indicative that the population mean and variance are related.

  • linearity: the underlying relationships must be simple linear trends, since the line of best fit through the data (of which the slope is estimated) is linear. The right hand figure depicts a linear trend through the underlying populations.

It is important to disclose the meaning of the word “linear” in the term “linear regression”. Technically, it refers to a linear combination of regression coefficients. For example, the following are examples of linear models:

  • \(y_i = \beta_0 + \beta_1 x_i\)
  • \(y_i = \beta_0 + \beta_1 x_i + \beta_2 z_i\)
  • \(y_i = \beta_0 + \beta_1 x_i + \beta_2 x^2_i\)

All the coefficients (\(\beta_0\), \(\beta_1\), \(\beta_2\)) are linear terms. Note that the last of the above examples, is a linear model, however it describes a non-linear trend. Contrast the above models with the following non-linear model:

  • \(y_i = \beta_0 + x_i^{\beta_1}\)

In that case, the coefficients are not linear combinations (one of them is a power term).

That said, a simple linear regression usually fits a straight (linear) line through the data. Therefore, prior to fitting such a model, it is necessary to establish whether this really is the most sensible way of describing the relationship. That is, does the relationship appear to be linearly related or could some other non-linear function describe the relationship better. Scatterplots and residual plots are useful diagnostics.

To see how a residual plot could be useful, consider the following. The first row of figures illustrate the residuals resulting from data drawn from a linear trend. The residuals are effectively random noise. By contrast, the second row show the residuals resulting from data drawn from a non-normal relationship that have nevertheless been modelled as a linear trend. There is still a clear pattern remaining in the residuals.

The above might be an obvious and somewhat overly contrived example, yet it does illustrate the point - that a pattern in the residuals could point to a mis-specified model.

If non-linearity does exist (as in the second case above) , then fitting a straight line through what is obviously not a straight relationship is likely to poorly represent the true nature of the relationship. There are numerous causes of non-linearity:

  1. underlying distributional issues can result in non-linearity. For example, if we are assuming a gaussian distribution and the data are non-normal, often the relationships will appear non-linear. Addressing the distributional issues can therefore resolve the linearity issues
  2. the underlying relationship might truly be non-linear in which case this should be reflected in some way by the model formula. If the model formula fails to describe the non-linear trend, then problems will persist.
  3. the model proposed is missing an important covariate that might help standardise the data in a way that results in linearity
  • independence: the residuals (and thus observations) must be independently drawn from the populations. That is, the correlation between all the observations is assumed to be 0 (off-diagonals in the covariance matrix). More practically, there should be no pattern to the correlations between observations.

    Random sampling and random treatment assignment are experimental design elements that are intended to mitigate many types of sampling biases that cause dependencies between observations. Nevertheless, there are aspects of sampling designs that are either logistically difficult to randomise or in some cases not logically possible. For example, the residuals from observations sampled closer together in space and time will likely be more similar to one another than those of observations more spaced apart. Since neither space nor time can be randomised, data collected from sampling designs that involve sampling over space and/or time need to be assess for spatial and temporal dependencies. These concepts will be explored in the context of introducing susceptible designs in a later tutorial.

The above is only a very brief overview of the model assumptions that apply to just one specific model (simple linear gaussian regression). For the remainder of this section, we will graphically explore the two motivating example data sets so as gain insights into what distributional assumptions might be most valid, and thus help guide modelling choices. Similarly, for subsequent tutorials in this series (that introduce progressively more complex models), all associated assumptions will be explored and detailed.

dat |> 
  ggplot(aes(y = y)) +
  geom_boxplot()

Conclusions

  • there is no strong evidence of non-normality
  • to be convincing evidence of non-normality, each segment of the boxplot should get progressively larger
dat |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_point()
`geom_smooth()` using formula = 'y ~ x'

Conclusions

  • the spread of values around the trendline seems fairly even (hence it there is no evidence of non-homogeneity
dat |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_point()
`geom_smooth()` using formula = 'y ~ x'

Conclusions

  • the data seems well represented by the linear trendline. Furthermore, the lowess smoother does not appear to have a consistent shift trajectory.

Conclusions

  • there are no obvious violations of the linear regression model assumptions
  • we can now fit the suggested model
  • full confirmation about the model’s goodness of fit should be reserved until after exploring the additional diagnostics that are only available after fitting the model.
dat2 |> 
  ggplot(aes(y = y, x = x)) +
  geom_boxplot()

Conclusions

  • there is no consitent evidence of non-normality across all groups
  • even though the control group demonstrates some evidence of non-normality
dat2 |> 
  ggplot(aes(y = y, x = x)) +
  geom_boxplot()

Conclusions

  • the spread of noise in each group seems reasonably similar
  • more importantly, there does not seem to be a relationship between the mean (as approximated by the position of the boxplots along the y-axis) and the variance (as approximated by the spread of the boxplots).
  • that is, the size of the boxplots do not vary with the elevation of the boxplots.

Linearity is not an issue for categorical predictors since it is effectively fitting separate lines between pairs of points (and a line between two points can only ever be linear)….

Conclusions

  • no evidence of non-normality
  • no evidence of non-homogeneity of variance
dat3 |> 
  ggplot(aes(y = y)) +
  geom_boxplot()

Conclusions

  • there is strong evidence of non-normality
  • each segment of the boxplot should get progressively larger
dat3 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_point()
`geom_smooth()` using formula = 'y ~ x'

Conclusions

  • the spread of noise does not look random along the line of best fit.
  • homogeneity of variance is difficult to assess in the presence of distributional issues (such as non-normality in this case) as they can result in non-linearity (apparent here)
dat3 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_smooth(colour = "red") +
  geom_point()
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Conclusions

  • the data do not appear to be linear
  • the red line is a loess smoother and it is clear that the data are not linear

Conclusions

  • there are obvious violations of the linear regression model assumptions
  • we should consider a different model that does not assume normality
dat4 |> 
  ggplot(aes(y = y)) +
  geom_boxplot()

Conclusions

  • there is strong evidence of non-normality
  • each segment of the boxplot should get progressively larger
dat4 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_point()
`geom_smooth()` using formula = 'y ~ x'

Conclusions

  • the spread of noise does not look random along the line of best fit.
  • homogeneity of variance is difficult to assess in the presence of distributional issues (such as non-normality in this case) as they can result in non-linearity (apparent here)
dat4 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_smooth(colour = "red") +
  geom_point()
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Conclusions

  • the data do not appear to be linear
  • the red line is a loess smoother and it is clear that the data are not linear

Conclusions

  • there are obvious violations of the linear regression model assumptions
  • we should consider a different model that does not assume normality
dat5 |> 
  ggplot(aes(y = y)) +
  geom_boxplot()

Conclusions

  • clearly a set of 0’s and 1’s cant be normally distributied.
dat5 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_point()
`geom_smooth()` using formula = 'y ~ x'

Conclusions

  • the spread of noise does not look random (or equal) along the line of best fit.
dat5 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_smooth(colour = "red") +
  geom_point()
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Conclusions

  • the data are clearly not linear
  • the red line is a loess smoother and it is clear that the data are not linear

Conclusions

  • there are obvious violations of the linear regression model assumptions
  • we should consider a different model that does not assume normality
dat6 |> 
  ggplot(aes(y = y)) +
  geom_boxplot()

Conclusions

  • distribution is not normal and is truncated
dat6 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_point()
`geom_smooth()` using formula = 'y ~ x'

Conclusions

  • the spread of noise does not look random (or equal) along the line of best fit.
dat6 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_smooth(colour = "red") +
  geom_point()
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Conclusions

  • although there is no evidence of non-linearity from this small data set, it is worth noting that the line of best fit does extend outside the logical response range [0.1] within the range of observed \(x\) values. That is, a simple linear model would predict proportions higher than 100% at high values of \(x\)
  • this is a common issue with binomial data and is often addressed by fitting a logistic regression model

Conclusions

  • there are obvious violations of the linear regression model assumptions
  • we should consider a different model that does not assume normality

7 Fitting models

One way to assess the priors is to have the MCMC sampler sample purely from the prior predictive distribution without conditioning on the observed data. Doing so provides a glimpse at the range of predictions possible under the priors. On the one hand, wide ranging predictions would ensure that the priors are unlikely to influence the actual predictions once they are conditioned on the data. On the other hand, if they are too wide, the sampler is being permitted to traverse into regions of parameter space that are not logically possible in the context of the actual underlying ecological context. Not only could this mean that illogical parameter estimates are possible, when the sampler is traversing regions of parameter space that are not supported by the actual data, the sampler can become unstable and have difficulty.

In brms, we can inform the sampler to draw from the prior predictive distribution instead of conditioning on the response, by running the model with the sample_prior = 'only' argument. Unfortunately, this cannot be applied when there are flat priors (since the posteriors will necessarily extend to negative and positive infinity). Therefore, in order to use this useful routine, we need to make sure that we have defined a proper prior for all parameters.

Earlier we suggested the following priors might be useful:

  • \(\beta_0\): Normal prior centred at 31.21 with a variance of 15.17
    • mean:

      dat$y |> median() |> round(2)
      [1] 31.21
    • variance:

      dat$y |> mad() |> round(2)
      [1] 15.17
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 4.09
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      mad(dat$y) / mad(dat$x) |> round(2)
      [1] 4.090138
  • \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 15.17
    • variance:

      dat$y |> mad() |> round(2)
      [1] 15.17

It might be use usefull to understand what some of these distributions look like. For example, we have used both a normal (Gaussian) distribution and a flatter t distribution for y-intercept and slope respectively. This was a somewhat arbitrary choice. We could easily have gone with either normal or t distributions for all of the above parameters. To visualise prior distributions for the slope based on both normal and t distributions:

standist::visualize("normal(0, 4.09)", "student_t(3, 0, 4.09)", xlim = c(-20, 20))

Evidently, the t distribution (with 3 degrees of freedom) is wider than the normal distribution. The former should be more robust to data with values that are less concentrated around the mean.

priors <- prior(normal(31.21, 15.17), class = "Intercept") +
    prior(student_t(3, 0, 4.09), class = "b") +
    prior(student_t(3, 0, 15.17), class = "sigma")

\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 x_i\\ \end{align} \]

  1. start by fitting the model and sampling from the priors only
dat <- data.frame(y = rnorm(10), x = rnorm(10))
brm(y ~ x, data = dat, backend = "rstan")
Compiling Stan program...
Start sampling

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 7e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.013 seconds (Warm-up)
Chain 1:                0.013 seconds (Sampling)
Chain 1:                0.026 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 3e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.013 seconds (Warm-up)
Chain 2:                0.013 seconds (Sampling)
Chain 2:                0.026 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 3e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.012 seconds (Warm-up)
Chain 3:                0.013 seconds (Sampling)
Chain 3:                0.025 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 3e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.013 seconds (Warm-up)
Chain 4:                0.013 seconds (Sampling)
Chain 4:                0.026 seconds (Total)
Chain 4: 
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: dat (Number of observations: 10) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.27      0.37    -0.45     1.01 1.00     2264     1483
x            -0.10      0.44    -0.98     0.82 1.00     2351     1997

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.09      0.33     0.66     1.92 1.00     2182     2260

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
dat1a.brm <- brm(bf(y ~ x),
                data=dat,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone
dat1a.brm |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat1a.brm2 <- update(dat1a.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat1a.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat1a.brm2 |> tidybayes::get_variables()
 [1] "b_Intercept"     "b_x"             "sigma"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_sigma"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"       
dat1a.brm2 |> hypothesis("x = 0") |> plot()

dat1a.brm2 |> hypothesis("sigma = 0", class = "") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat1a.brm2 |> tidybayes::get_variables()
 [1] "b_Intercept"     "b_x"             "sigma"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_sigma"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"       
dat1a.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

Unless you explicitly direct brm to include a user-defined intercept, the priors on the default intercept should assume that the predictor(s) are centered (because brm will automatically center all continuous predictors).

Lets try the following priors:

  • \(\beta_0\): Normal prior centred at 0.61 with a variance of 0.48
    • mean:

      dat$y |> median() |> round(2)
      [1] 0.61
    • variance:

      dat$y |> mad() |> round(2)
      [1] 0.48
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centered at 0 with a variance of 0.75
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      mad(dat$y) / mad(dat$x) |> round(2)
      [1] 0.7518247
  • \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 0.48
    • variance:

      dat$y |> mad() |> round(2)
      [1] 0.48
priors <- prior(normal(31.21, 15.17), class = "Intercept") +
    prior(student_t(3, 0, 4.09), class = "b") +
    prior(student_t(3, 0, 15.17), class = "sigma")

\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 (x_i - \bar{x})\\ \end{align} \]

  1. start by fitting the model and sampling from the priors only
dat1b.brm <- brm(bf(y ~ scale(x, scale = FALSE)),
                data=dat,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone
dat1b.brm |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat1b.brm2 <- update(dat1b.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat1b.brm2 |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat1b.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "sigma"               
 [4] "Intercept"            "prior_Intercept"      "prior_b"             
 [7] "prior_sigma"          "lprior"               "lp__"                
[10] "accept_stat__"        "stepsize__"           "treedepth__"         
[13] "n_leapfrog__"         "divergent__"          "energy__"            
dat1b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot()

dat1b.brm2 |> hypothesis("sigma = 0", class = "") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat1b.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "sigma"               
 [4] "Intercept"            "prior_Intercept"      "prior_b"             
 [7] "prior_sigma"          "lprior"               "lp__"                
[10] "accept_stat__"        "stepsize__"           "treedepth__"         
[13] "n_leapfrog__"         "divergent__"          "energy__"            
dat1b.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

When the predictor is standardised, it simplifies prior definition because we no longer need to consider the scale of the predictor.

Lets try the following priors:

  • \(\beta_0\): Normal prior centred at 0.61 with a variance of 0.48
    • mean:

      dat$y |> median() |> round(2)
      [1] 0.61
    • variance:

      dat$y |> mad() |> round(2)
      [1] 0.48
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centered at 0 with a variance of 0.48
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      mad(dat$y) |> round(2)
      [1] 0.48
  • \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 0.48
    • variance:

      dat$y |> mad() |> round(2)
      [1] 0.48
priors <- prior(normal(31.21, 15.17), class = "Intercept") +
    prior(student_t(3, 0, 15.17), class = "b") +
    prior(student_t(3, 0, 15.17), class = "sigma")

\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 (x_i - \bar{x})/\sigma_{x}\\ \end{align} \]

  1. start by fitting the model and sampling from the priors only
dat1c.brm <- brm(bf(y ~ scale(x)),
                data=dat,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone
dat1c.brm |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat1c.brm2 <- update(dat1c.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat1c.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat1c.brm2 |> tidybayes::get_variables()
 [1] "b_Intercept"     "b_scalex"        "sigma"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_sigma"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"       
dat1c.brm2 |> hypothesis("scalex = 0") |> plot()

dat1c.brm2 |> hypothesis("sigma = 0", class = "") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat1c.brm2 |> tidybayes::get_variables()
 [1] "b_Intercept"     "b_scalex"        "sigma"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_sigma"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"       
dat1c.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

Lets try the following priors:

  • \(\beta_0\): Normal prior centred at 19.68 with a variance of 1.87
    • mean:

      dat2 |>
        group_by(x) |>
          summarise(across(y, list(med = median, sd = sd, mad = mad)))
      # A tibble: 3 × 4
        x       y_med  y_sd y_mad
        <fct>   <dbl> <dbl> <dbl>
      1 control 19.7   3.74  1.87
      2 medium  19.2   4.90  4.70
      3 high     9.83  3.46  3.10
    • variance:

      dat2 |>
        group_by(x) |>
          summarise(across(y, list(med = median, sd = sd, mad = mad)))
      # A tibble: 3 × 4
        x       y_med  y_sd y_mad
        <fct>   <dbl> <dbl> <dbl>
      1 control 19.7   3.74  1.87
      2 medium  19.2   4.90  4.70
      3 high     9.83  3.46  3.10
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centered at 0 with a variance of 9.35
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      dat2 |>
        group_by(x) |>
        summarise(across(y, median)) |>
        pull(y) |>
        diff() |>
        abs() |>
        max()
      [1] 9.352104
  • \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 4.7
    • variance:

      dat2 |>
      group_by(x) |>
      summarise(across(y, mad)) |>
      pull(y) |>
      max()
      [1] 4.702147
priors <- prior(normal(19.68, 1.87), class = "Intercept") +
    prior(student_t(3, 0, 6.35), class = "b") +
    prior(student_t(3, 0, 4.7), class = "sigma")

\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \sum{\beta_j x_{ij}}\\ \end{align} \]

  1. start by fitting the model and sampling from the priors only
dat2a.brm <- brm(bf(y ~ x),
                data=dat2,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0) 
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone
dat2a.brm |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat2a.brm2 <- update(dat2a.brm,
    sample_prior = "yes"
) 
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat2a.brm2 |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat2a.brm2 |> tidybayes::get_variables()
 [1] "b_Intercept"     "b_xmedium"       "b_xhigh"         "sigma"          
 [5] "Intercept"       "prior_Intercept" "prior_b"         "prior_sigma"    
 [9] "lprior"          "lp__"            "accept_stat__"   "stepsize__"     
[13] "treedepth__"     "n_leapfrog__"    "divergent__"     "energy__"       
dat2a.brm2 |> hypothesis("xmedium = 0") |> plot()

dat2a.brm2 |> hypothesis("sigma = 0", class = "") |> plot() 

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat2a.brm2 |> tidybayes::get_variables()
 [1] "b_Intercept"     "b_xmedium"       "b_xhigh"         "sigma"          
 [5] "Intercept"       "prior_Intercept" "prior_b"         "prior_sigma"    
 [9] "lprior"          "lp__"            "accept_stat__"   "stepsize__"     
[13] "treedepth__"     "n_leapfrog__"    "divergent__"     "energy__"       
dat2a.brm2 |> SUYR_prior_and_posterior() 

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

Lets try the following priors:

  • \(\beta\): t distribution (3 degrees of freedom) prior centered at 18.14 with a variance of 6.26
    • mean: since each groups mean is being estimated separately, they could either all have different priors, or more commonly, the same priors.

    • variance:

      dat2 |>
          summarise(across(y, list(med = median, sd = sd, mad = mad)))
           y_med     y_sd    y_mad
      1 18.13762 6.003828 6.255954
  • \(\sigma\): (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 4.7
    • variance:

      dat2 |>
      group_by(x) |>
      summarise(across(y, mad)) |>
      pull(y) |>
      max()
      [1] 4.702147
priors <- prior(student_t(3, 16.18, 6.56), class = "b") +
    prior(student_t(3, 0, 4.7), class = "sigma")

\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \sum{\beta_j x_{ij}}\\ \end{align} \]

  1. start by fitting the model and sampling from the priors only
dat2b.brm <- brm(bf(y ~ -1 + x),
                data=dat2,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0) 
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone
dat2b.brm |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat2b.brm2 <- update(dat2b.brm,
    sample_prior = "yes"
) 
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat2b.brm2 |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat2b.brm2 |> tidybayes::get_variables()
 [1] "b_xcontrol"    "b_xmedium"     "b_xhigh"       "sigma"        
 [5] "prior_b"       "prior_sigma"   "lprior"        "lp__"         
 [9] "accept_stat__" "stepsize__"    "treedepth__"   "n_leapfrog__" 
[13] "divergent__"   "energy__"     
dat2b.brm2 |> hypothesis("xcontrol = 0") |> plot()

dat2b.brm2 |> hypothesis("xmedium = 0") |> plot()

dat2b.brm2 |> hypothesis("sigma = 0", class = "") |> plot() 

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat2b.brm2 |> tidybayes::get_variables()
 [1] "b_xcontrol"    "b_xmedium"     "b_xhigh"       "sigma"        
 [5] "prior_b"       "prior_sigma"   "lprior"        "lp__"         
 [9] "accept_stat__" "stepsize__"    "treedepth__"   "n_leapfrog__" 
[13] "divergent__"   "energy__"     
#dat2b.brm2 |> SUYR_prior_and_posterior() 

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Pois(\lambda_i)\\ log(\lambda_i) =& \beta_0 + \beta_1 x_i\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:

  • \(\beta_0\): Normal prior centred at 1.99 with a variance of 1.33
    • mean:

      dat3$y |>
          log() |> 
          median() |>
          round(2)
      [1] 1.99
    • variance:

      dat3$y |>
          log() |> 
          mad() |>
          round(2)
      [1] 1.33
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 2.05
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      dat3 |>
          mutate(across(c(y, x), log)) |>
          summarise(across(c(y, x), mad)) |>
          mutate(round(y / x, 2))
               y        x round(y/x, 2)
      1 1.328231 0.648985          2.05
priors <- prior(normal(2.00, 1.33), class = "Intercept") +
    prior(student_t(3, 0, 2.00), class = "b")
  1. start by fitting the model and sampling from the priors only
dat3a.form <- bf(y ~ x, family = poisson(link = "log"))
dat3a.brm <- brm(dat3a.form,
                data=dat3,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone
dat3a.brm |>
  conditional_effects() |>
  plot(points = TRUE) |> _[[1]] +
  scale_y_log10()

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat3a.brm2 <- update(dat3a.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat3a.brm2 |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat3a.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_x"             "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat3a.brm2 |> hypothesis("x = 0") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat3a.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_x"             "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat3a.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Pois(\mu_i)\\ log(\mu_i) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:

  • \(\beta_0\): Normal prior centred at 1.99 with a variance of 1.33
    • mean:

      dat3$y |>
          log() |> 
          median() |>
          round(2)
      [1] 1.99
    • variance:

      dat3$y |>
          log() |> 
          mad() |>
          round(2)
      [1] 1.33
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 2.05
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      dat3 |>
          mutate(across(c(y, x), log)) |>
          summarise(across(c(y, x), mad)) |>
          mutate(round(y / x, 2))
               y        x round(y/x, 2)
      1 1.328231 0.648985          2.05
priors <- prior(normal(2.00, 1.33), class = "Intercept") +
    prior(student_t(3, 0, 2.00), class = "b")
  1. start by fitting the model and sampling from the priors only
dat3b.form <- bf(y ~ scale(x, scale = FALSE), family = poisson(link = "log"))
dat3b.brm <- brm(dat3b.form,
                data=dat3,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone
dat3b.brm |>
  conditional_effects() |>
  plot(points = TRUE) |> _[[1]] +
  scale_y_log10()

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat3b.brm2 <- update(dat3b.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat3b.brm2 |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat3b.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "Intercept"           
 [4] "prior_Intercept"      "prior_b"              "lprior"              
 [7] "lp__"                 "accept_stat__"        "stepsize__"          
[10] "treedepth__"          "n_leapfrog__"         "divergent__"         
[13] "energy__"            
dat3b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat3b.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "Intercept"           
 [4] "prior_Intercept"      "prior_b"              "lprior"              
 [7] "lp__"                 "accept_stat__"        "stepsize__"          
[10] "treedepth__"          "n_leapfrog__"         "divergent__"         
[13] "energy__"            
dat3b.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Pois(\lambda_i)\\ log(\lambda_i) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:

  • \(\beta_0\): Normal prior centred at 1.99 with a variance of 1.33
    • mean:

      dat3$y |>
          log() |> 
          median() |>
          round(2)
      [1] 1.99
    • variance:

      dat3$y |>
          log() |> 
          mad() |>
          round(2)
      [1] 1.33
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1.33
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      dat3$y |> log() |> mad() |> round(2)
      [1] 1.33
priors <- prior(normal(2.00, 1.33), class = "Intercept") +
    prior(student_t(3, 0, 1.33), class = "b")
  1. start by fitting the model and sampling from the priors only
dat3c.form <- bf(y ~ scale(x), family = poisson(link = "log"))
dat3c.brm <- brm(dat3c.form,
                data=dat3,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone
dat3c.brm |>
  conditional_effects() |>
  plot(points = TRUE) |> _[[1]] +
  scale_y_log10()

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat3c.brm2 <- update(dat3c.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat3c.brm2 |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat3c.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat3c.brm2 |> hypothesis("scalex = 0") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat3c.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat3c.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& NB(\lambda_i, \phi)\\ log(\lambda_i) =& \beta_0 + \beta_1 x_i\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Negative Binomial models, the link scale is log. So the following priors might be appropriate:

  • \(\beta_0\): Normal prior centred at 2.07 with a variance of 0.93
    • mean:

      dat4$y |>
          log() |> 
          median() |>
          round(2)
      [1] 2.07
    • variance:

      dat4$y |>
          log() |> 
          mad() |>
          round(2)
      [1] 0.93
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1.43
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      dat4 |>
          mutate(across(c(y, x), log)) |>
          summarise(across(c(y, x), mad)) |>
          mutate(round(y / x, 2))
                y        x round(y/x, 2)
      1 0.9303522 0.648985          1.43
priors <- prior(normal(2.00, 1.00), class = "Intercept") +
    prior(student_t(3, 0, 1.5), class = "b") +
    prior(gamma(0.01, 0.01), class = "shape")
  1. start by fitting the model and sampling from the priors only
dat4a.form <- bf(y ~ x, family = negbinomial(link = "log"))
dat4a.brm <- brm(dat4a.form,
                data=dat4,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0) 
Compiling Stan program...
Start sampling
Warning: There were 1517 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
  1. explore the range of posterior predictions resulting from the priors alone
dat4a.brm |>
  conditional_effects() |>
  plot(points = TRUE) |> _[[1]] +
  scale_y_log10()

Warning in scale_y_log10(): log-10 transformation introduced infinite values.

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat4a.brm2 <- update(dat4a.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat4a.brm2 |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat4a.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_x"             "shape"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_shape"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"       
dat4a.brm2 |> hypothesis("x = 0") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat4a.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_x"             "shape"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_shape"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"       
dat4a.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& NB(\mu_i, \phi)\\ log(\mu_i) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:

  • \(\beta_0\): Normal prior centred at 2.07 with a variance of 0.93
    • mean:

      dat4$y |>
          log() |> 
          median() |>
          round(2)
      [1] 2.07
    • variance:

      dat4$y |>
          log() |> 
          mad() |>
          round(2)
      [1] 0.93
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1.43
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      dat4 |>
          mutate(across(c(y, x), log)) |>
          summarise(across(c(y, x), mad)) |>
          mutate(round(y / x, 2))
                y        x round(y/x, 2)
      1 0.9303522 0.648985          1.43
priors <- prior(normal(2.07, 0.93), class = "Intercept") +
    prior(student_t(3, 0, 1.43), class = "b") +
    prior(gamma(0.01, 0.01), class = "shape")
  1. start by fitting the model and sampling from the priors only
dat4b.form <- bf(y ~ scale(x, scale = FALSE), family = negbinomial(link = "log"))
dat4b.brm <- brm(dat4b.form,
                data=dat4,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Compiling Stan program...
Start sampling
Warning: There were 1641 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
  1. explore the range of posterior predictions resulting from the priors alone
dat4b.brm |>
  conditional_effects() |>
  plot(points = TRUE) |> _[[1]] +
  scale_y_log10()

Warning in scale_y_log10(): log-10 transformation introduced infinite values.

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat4b.brm2 <- update(dat4b.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat4b.brm2 |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat4b.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "shape"               
 [4] "Intercept"            "prior_Intercept"      "prior_b"             
 [7] "prior_shape"          "lprior"               "lp__"                
[10] "accept_stat__"        "stepsize__"           "treedepth__"         
[13] "n_leapfrog__"         "divergent__"          "energy__"            
dat4b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat4b.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "shape"               
 [4] "Intercept"            "prior_Intercept"      "prior_b"             
 [7] "prior_shape"          "lprior"               "lp__"                
[10] "accept_stat__"        "stepsize__"           "treedepth__"         
[13] "n_leapfrog__"         "divergent__"          "energy__"            
dat4b.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& NB(\lambda_i, \phi)\\ log(\lambda_i) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:

  • \(\beta_0\): Normal prior centred at 2.07 with a variance of 0.93
    • mean:

      dat4$y |>
          log() |> 
          median() |>
          round(2)
      [1] 2.07
    • variance:

      dat4$y |>
          log() |> 
          mad() |>
          round(2)
      [1] 0.93
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.93
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      dat4$y |> log() |> mad() |> round(2)
      [1] 0.93
priors <- prior(normal(2.07,0.93), class = "Intercept") +
    prior(student_t(3, 0, 1.43), class = "b") +
    prior(gamma(0.01, 0.01), class = "shape")
  1. start by fitting the model and sampling from the priors only
dat4c.form <- bf(y ~ scale(x), family = negbinomial(link = "log"))
dat4c.brm <- brm(dat4c.form,
                data=dat4,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Compiling Stan program...
Start sampling
Warning: There were 1537 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
  1. explore the range of posterior predictions resulting from the priors alone
dat4c.brm |>
  conditional_effects() |>
  plot(points = TRUE) |> _[[1]] +
  scale_y_log10()

Warning in scale_y_log10(): log-10 transformation introduced infinite values.

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat4c.brm2 <- update(dat4c.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat4c.brm2 |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat4c.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_scalex"        "shape"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_shape"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"       
dat3c.brm2 |> hypothesis("scalex = 0") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat4c.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_scalex"        "shape"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_shape"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"       
dat4c.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + \beta_1 x_i\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Binary models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:

  • the observed response values are only ever either 0 or 1
  • a linear model is exploring whether the probability of a 1 changes from high to low or low to high according to the linear predictor
  • the switch in probability is likely to be somewhere near the middle of the \(x\) range
  • with a centered predictor, the mean response is expected to be approximately 0.5
  • on a logit (log odds) scale, this corresponds to a value of 0.
  • on a logit (log odds) scale, values of -3 and 3 are considered very wide
  • on a logit scale, values between -1 and 1 are reasonable.

So the following priors might be appropriate:

  • \(\beta_0\): Normal prior centred at 0 with a variance of 1
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1
priors <- prior(normal(0, 1), class = "Intercept") +
    prior(student_t(3, 0, 1), class = "b")
  1. start by fitting the model and sampling from the priors only
dat5a.form <- bf(y | trials(1) ~ x, family = binomial(link = "logit"))
dat5a.brm <- brm(dat5a.form,
                data=dat5,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone

For Binary data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.

dat5a.brm |>
  conditional_effects(method = "posterior_linpred") 

dat5a.brm |>
  conditional_effects() |>
  plot(points = TRUE) 

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat5a.brm2 <- update(dat5a.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat5a.brm2 |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat5a.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_x"             "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat5a.brm2 |> hypothesis("x = 0") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat5a.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_x"             "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat5a.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1 -\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]

  • \(\beta_0\): Normal prior centred at 0 with a variance of 1
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1
priors <- prior(normal(0, 1), class = "Intercept") +
    prior(student_t(3, 0, 1), class = "b")
  1. start by fitting the model and sampling from the priors only
dat5b.form <- bf(y | trials(1) ~ scale(x, scale = FALSE), family = binomial(link = "logit"))
dat5b.brm <- brm(dat5b.form,
                data=dat5,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone
dat5b.brm |>
  conditional_effects(method = "posterior_linpred") 

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat5b.brm2 <- update(dat5b.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat5b.brm2 |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat5b.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "Intercept"           
 [4] "prior_Intercept"      "prior_b"              "lprior"              
 [7] "lp__"                 "accept_stat__"        "stepsize__"          
[10] "treedepth__"          "n_leapfrog__"         "divergent__"         
[13] "energy__"            
dat5b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat5b.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "Intercept"           
 [4] "prior_Intercept"      "prior_b"              "lprior"              
 [7] "lp__"                 "accept_stat__"        "stepsize__"          
[10] "treedepth__"          "n_leapfrog__"         "divergent__"         
[13] "energy__"            
dat5b.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1 -\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} \]

  • \(\beta_0\): Normal prior centred at 0 with a variance of 1
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1
priors <- prior(normal(0, 1), class = "Intercept") +
    prior(student_t(3, 0, 1), class = "b")
  1. start by fitting the model and sampling from the priors only
dat5c.form <- bf(y | trials(1) ~ scale(x), family = binomial(link = "logit"))
dat5c.brm <- brm(dat5c.form,
                data=dat5,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone
dat5c.brm |>
  conditional_effects(method = "posterior_linpred") 

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat5c.brm2 <- update(dat5c.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat5c.brm2 |>
    conditional_effects() |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat5c.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat5c.brm2 |> hypothesis("scalex = 0") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat5c.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat5c.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + \beta_1 x_i\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:

  • the expected \(\pi\) values are only ever between 0 or 1
  • on a logit (log odds) scale, values of -3 and 3 are considered very wide
  • on a logit scale, values between -1 and 1 are reasonable.

So the following priors might be appropriate:

  • \(\beta_0\): Normal prior centred at 0 with a variance of 0
    • mean:

      (dat6$count/dat6$total) |>
          log() |> 
          median() |>
          round(2)
      [1] -0.22
    • variance:

      (dat6$count/dat6$total) |>
          log() |> 
          mad() |>
          round(2)
      [1] 0.33
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.51
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      dat6 |>
          mutate(across(c(y, x), log)) |>
          summarise(across(c(y, x), mad)) |>
          mutate(round(y / x, 2))
                y        x round(y/x, 2)
      1 0.3308326 0.648985          0.51
priors <- prior(normal(-0.22, 0.33), class = "Intercept") +
    prior(student_t(3, 0, 0.51), class = "b") 
  1. start by fitting the model and sampling from the priors only
dat6a.form <- bf(count | trials(total) ~ x, family = binomial(link = "logit"))
dat6a.brm <- brm(dat6a.form,
                data=dat6,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone

For Binary data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.

dat6a.brm |>
  conditional_effects(method = "posterior_linpred") 
Setting all 'trials' variables to 1 by default if not specified otherwise.

dat6a.brm |>
  conditional_effects(conditions = data.frame(total = dat6$total)) |>
  plot(points = TRUE) 

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat6a.brm2 <- update(dat6a.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat6a.brm2 |>
    conditional_effects(conditions = data.frame(total = dat6$total)) |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat6a.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_x"             "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat6a.brm2 |> hypothesis("x = 0") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat6a.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_x"             "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat6a.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:

  • the expected \(\pi\) values are only ever between 0 or 1
  • on a logit (log odds) scale, values of -3 and 3 are considered very wide
  • on a logit scale, values between -1 and 1 are reasonable.

So the following priors might be appropriate:

  • \(\beta_0\): Normal prior centred at 0 with a variance of 0
    • mean:

      (dat6$count/dat6$total) |>
          log() |> 
          median() |>
          round(2)
      [1] -0.22
    • variance:

      (dat6$count/dat6$total) |>
          log() |> 
          mad() |>
          round(2)
      [1] 0.33
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.51
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      dat6 |>
          mutate(across(c(y, x), log)) |>
          summarise(across(c(y, x), mad)) |>
          mutate(round(y / x, 2))
                y        x round(y/x, 2)
      1 0.3308326 0.648985          0.51
priors <- prior(normal(-0.22, 0.33), class = "Intercept") +
    prior(student_t(3, 0, 0.51), class = "b") 
  1. start by fitting the model and sampling from the priors only
dat6b.form <- bf(count | trials(total) ~ scale(x, scale = FALSE), 
  family = binomial(link = "logit"))
dat6b.brm <- brm(dat6b.form,
                data=dat6,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone

For Binomial data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.

dat6b.brm |>
  conditional_effects(method = "posterior_linpred") 
Setting all 'trials' variables to 1 by default if not specified otherwise.

dat6b.brm |>
  conditional_effects(conditions = data.frame(total = dat6$total)) |>
  plot(points = TRUE) 

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat6b.brm2 <- update(dat6b.brm,
    sample_prior = "yes"
)
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat6b.brm2 |>
    conditional_effects(conditions = data.frame(total = dat6$total)) |>
    plot(points = TRUE) 

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat6b.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "Intercept"           
 [4] "prior_Intercept"      "prior_b"              "lprior"              
 [7] "lp__"                 "accept_stat__"        "stepsize__"          
[10] "treedepth__"          "n_leapfrog__"         "divergent__"         
[13] "energy__"            
dat6b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat6b.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "Intercept"           
 [4] "prior_Intercept"      "prior_b"              "lprior"              
 [7] "lp__"                 "accept_stat__"        "stepsize__"          
[10] "treedepth__"          "n_leapfrog__"         "divergent__"         
[13] "energy__"            
dat6b.brm2 |> SUYR_prior_and_posterior()

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\sigma\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:

  • the expected \(\pi\) values are only ever between 0 or 1
  • on a logit (log odds) scale, values of -3 and 3 are considered very wide
  • on a logit scale, values between -1 and 1 are reasonable.

So the following priors might be appropriate:

  • \(\beta_0\): Normal prior centred at 0 with a variance of 0
    • mean:

      (dat6$count/dat6$total) |>
          log() |> 
          median() |>
          round(2)
      [1] -0.22
    • variance:

      (dat6$count/dat6$total) |>
          log() |> 
          mad() |>
          round(2)
      [1] 0.33
  • \(\beta_1\): t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.33
    • mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0

    • variance:

      (dat6$count / dat6$total) |>
          log() |>
          mad() |>
          round(2)
      [1] 0.33
priors <- prior(normal(-0.22, 0.33), class = "Intercept") +
    prior(student_t(3, 0, 0.33), class = "b") 
  1. start by fitting the model and sampling from the priors only
dat6c.form <- bf(count | trials(total) ~ scale(x), 
  family = binomial(link = "logit"))
dat6c.brm <- brm(dat6c.form,
                data=dat6,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)
Compiling Stan program...
Start sampling
  1. explore the range of posterior predictions resulting from the priors alone

For Binomial data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.

dat6c.brm |>
  conditional_effects(method = "posterior_linpred") 
Setting all 'trials' variables to 1 by default if not specified otherwise.

dat6c.brm |>
  conditional_effects(conditions = data.frame(total = dat6$total)) |>
  plot(points = TRUE) 

Conclusions:

  • the grey ribbon above represents the credible range of the posterior predictions
  • this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
  • the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.
  1. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)
dat6c.brm2 <- update(dat6c.brm,
    sample_prior = "yes"
) 
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
  1. re-explore the range of posterior predictions resulting from the fitted model
dat6c.brm2 |>
    conditional_effects(conditions = data.frame(total = dat6$total)) |>
    plot(points = TRUE)  

Conclusions:

  • the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
  • this suggests that the patterns are being driven predominantly by the data
  1. compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat6c.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat6c.brm2 |> hypothesis("scalex = 0") |> plot() 

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat6c.brm2 |> tidybayes::get_variables() 
 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat6c.brm2 |> SUYR_prior_and_posterior() 

Conclusions:

  • each of the priors are substantially wider than the posteriors
  • the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
  • the priors are simply regularising the parameters such that they are only sampled from plausible regions

8 MCMC sampling diagnostics

MCMC sampling behaviour

Since the purpose of the MCMC sampling is to estimate the posterior of an unknown joint likelihood, it is important that we explore a range of diagnostics designed to help identify when the resulting likelihood might not be accurate.

  • traceplots - plots of the individual draws in sequence. Traces that resemble noise suggest that all likelihood features are likely to have be traversed. Obvious steps or blocks of noise are likely to represent distinct features and could imply that there are yet other features that have not yet been traversed - necessitating additional iterations. Furthermore, each chain should be indistinguishable from the others

  • autocorrelation function - plots of the degree of correlation between pairs of draws for a range of lags (distance along the chains). High levels of correlation (after a lag of 0, which is correlating each draw with itself) suggests a lack of independence between the draws and that therefore, summaries such as mean and median will be biased estimates. Ideally, all non-zero lag correlations should be less than 0.2. The left hand figure below demonstrates a clear pattern of autocorrelation, whereas the right hand figure shows no autocorrelation.

  • convergence diagnostics - there are a range of diagnostics aimed at exploring whether the multiple chains are likely to have converged upon similar posteriors
    • R hat - this metric compares between and within chain model parameter estimates, with the expectation that if the chains have converged, the between and within rank normalised estimates should be very similar (and Rhat should be close to 1). The more one chains deviates from the others, the higher the Rhat value. Values less than 1.05 are considered evidence of convergence.
    • Bulk ESS - this is a measure of the effective sample size from the whole (bulk) of the posterior and is a good measure of the sampling efficiency of draws across the entire posterior
    • Tail ESS - this is a measure of the effective sample size from the 5% and 95% quantiles (tails) of the posterior and is a good measure of the sampling efficiency of draws from the tail (areas of the posterior with least support and where samplers can get stuck).

There are numerous packages in R that support MCMC diagnostics. Popular packages include:

  • bayesplot
  • rstan
  • ggmcmcm

Some of the most useful diagnostics are presented in the following table.

Package Description function rstanarm brms
bayesplot Traceplot mcmc_trace plot(mod, plotfun='trace') mcmc_plot(mod, type='trace')
Density plot mcmc_dens plot(mod, plotfun='dens') mcmc_plot(mod, type='dens')
Density & Trace mcmc_combo plot(mod, plotfun='combo') mcmc_plot(mod, type='combo')
ACF mcmc_acf_bar plot(mod, plotfun='acf_bar') mcmc_plot(mod, type='acf_bar')
Rhat hist mcmc_rhat_hist plot(mod, plotfun='rhat_hist') mcmc_plot(mod, type='rhat_hist')
No. Effective mcmc_neff_hist plot(mod, plotfun='neff_hist') mcmc_plot(mod, type='neff_hist')
rstan Traceplot stan_trace stan_trace(mod) stan_trace(mod)
ACF stan_ac stan_ac(mod) stan_ac(mod)
Rhat stan_rhat stan_rhat(mod) stan_rhat(mod)
No. Effective stan_ess stan_ess(mod) stan_ess(mod)
Density plot stan_dens stan_dens(mod) stan_dens(mod)
ggmcmc Traceplot ggs_traceplot ggs_traceplot(ggs(mod)) ggs_traceplot(ggs(mod))
ACF ggs_autocorrelation ggs_autocorrelation(ggs(mod)) ggs_autocorrelation(ggs(mod))
Rhat ggs_Rhat ggs_Rhat(ggs(mod)) ggs_Rhat(ggs(mod))
No. Effective ggs_effective ggs_effective(ggs(mod)) ggs_effective(ggs(mod))
Cross correlation ggs_crosscorrelation ggs_crosscorrelation(ggs(mod)) ggs_crosscorrelation(ggs(mod))
Scale reduction ggs_grb ggs_grb(ggs(mod)) ggs_grb(ggs(mod))

I personally prefer the rstan version of plots and thus these are the ones I will showcase.

Note

Bayesian samplers involve many calls to randomisation functions. As a result, the estimates will vary slightly each time the routines are run. You should expect that the outputs that you obtain will differ slightly from those that I am displaying. Nevertheless, the main conclusions should remain robust across subsequent runs.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat1a.brm2$fit |> stan_trace()

dat1a.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat1a.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat1a.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat1a.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat1b.brm2$fit |> stan_trace()

dat1b.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat1b.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat1b.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat1b.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat1c.brm2$fit |> stan_trace()

dat1c.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat1c.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat1c.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat1c.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat2a.brm2$fit |> stan_trace()

dat2a.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat2a.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat2a.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat2a.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat2b.brm2$fit |> stan_trace()

dat2b.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat2b.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat2b.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat2b.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat3a.brm2$fit |> stan_trace()

dat3a.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat3a.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat3a.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat3a.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat3b.brm2$fit |> stan_trace()

dat3b.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat3b.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat3b.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat3b.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat3c.brm2$fit |> stan_trace()

dat3c.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat3c.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat3c.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat3c.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat4a.brm2$fit |> stan_trace()

dat4a.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat4a.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat4a.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat4a.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat4b.brm2$fit |> stan_trace()

dat4b.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat4b.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat4b.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat4b.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat4c.brm2$fit |> stan_trace()

dat4c.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat4c.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat4c.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat4c.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat5a.brm2$fit |> stan_trace()

dat5a.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat5a.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat5a.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat5a.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat5b.brm2$fit |> stan_trace()

dat5b.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat5b.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat5b.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat5b.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat5c.brm2$fit |> stan_trace()

dat5c.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat5c.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat5c.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat5c.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat6a.brm2$fit |> stan_trace()

dat6a.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat6a.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat6a.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat6a.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat6b.brm2$fit |> stan_trace()

dat6b.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat6b.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat6b.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat6b.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat6c.brm2$fit |> stan_trace()

dat6c.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar
dat6c.brm2$fit |> stan_ac()

Conclusions:

  • there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat6c.brm2$fit |> stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat6c.brm2$fit |> stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

  • ratios are all very high

Conclusions:

  • all the diagnostics appear reasonable
  • we can conclude that the chains are all well mixed and have converged on a stable posterior.

9 Model validation

Model validation involves exploring the model diagnostics and fit to ensure that the model is broadly appropriate for the data. As such, exploration of the residuals should be routine.

For more complex models (those that contain multiple effects, it is also advisable to plot the residuals against each of the individual predictors. For sampling designs that involve sample collection over space or time, it is also a good idea to explore whether there are any temporal or spatial patterns in the residuals.

There are numerous situations (e.g. when applying specific variance-covariance structures to a model) where raw residuals do not reflect the interior workings of the model. Typically, this is because they do not take into account the variance-covariance matrix or assume a very simple variance-covariance matrix. Since the purpose of exploring residuals is to evaluate the model, for these cases, it is arguably better to draw conclusions based on standardized (or studentized) residuals.

Unfortunately the definitions of standardised and studentised residuals appears to vary and the two terms get used interchangeably. I will adopt the following definitions:

Standardized residuals
the raw residuals divided by the true standard deviation of the residuals (which of course is rarely known).
Studentized residuals
the raw residuals divided by the standard deviation of the residuals. Note that externally studentised residuals are calculated by dividing the raw residuals by a unique standard deviation for each observation that is calculated from regressions having left each successive observation out.
Pearson residuals
the raw residuals divided by the standard deviation of the response variable.

The mark of a good model is being able to predict well. In an ideal world, we would have sufficiently large sample size as to permit us to hold a fraction (such as 25%) back thereby allowing us to train the model on 75% of the data and then see how well the model can predict the withheld 25%. Unfortunately, such a luxury is still rare in ecology.

The next best option is to see how well the model can predict the observed data. Models tend to struggle most with the extremes of trends and have particular issues when the extremes approach logical boundaries (such as zero for count data and standard deviations). We can use the fitted model to generate random predicted observations and then explore some properties of these compared to the actual observed data.

Package Description function rstanarm brms
bayesplot Density overlay ppc_dens_overlay pp_check(mod, plotfun='dens_overlay') pp_check(mod, type='dens_overlay')
Obs vs Pred error ppc_error_scatter_avg pp_check(mod, plotfun='error_scatter_avg') pp_check(mod, type='error_scatter_avg')
Pred error vs x ppc_error_scatter_avg_vs_x pp_check(mod, x=, plotfun='error_scatter_avg_vs_x') pp_check(mod, x=, type='error_scatter_avg_vs_x')
Preds vs x ppc_intervals pp_check(mod, x=, plotfun='intervals') pp_check(mod, x=, type='intervals')
Partial plot ppc_ribbon pp_check(mod, x=, plotfun='ribbon') pp_check(mod, x=, type='ribbon')
available_ppc()
bayesplot PPC module:
  ppc_bars
  ppc_bars_grouped
  ppc_boxplot
  ppc_dens
  ppc_dens_overlay
  ppc_dens_overlay_grouped
  ppc_ecdf_overlay
  ppc_ecdf_overlay_grouped
  ppc_error_binned
  ppc_error_hist
  ppc_error_hist_grouped
  ppc_error_scatter
  ppc_error_scatter_avg
  ppc_error_scatter_avg_grouped
  ppc_error_scatter_avg_vs_x
  ppc_freqpoly
  ppc_freqpoly_grouped
  ppc_hist
  ppc_intervals
  ppc_intervals_grouped
  ppc_km_overlay
  ppc_km_overlay_grouped
  ppc_loo_intervals
  ppc_loo_pit
  ppc_loo_pit_overlay
  ppc_loo_pit_qq
  ppc_loo_ribbon
  ppc_pit_ecdf
  ppc_pit_ecdf_grouped
  ppc_ribbon
  ppc_ribbon_grouped
  ppc_rootogram
  ppc_scatter
  ppc_scatter_avg
  ppc_scatter_avg_grouped
  ppc_stat
  ppc_stat_2d
  ppc_stat_freqpoly
  ppc_stat_freqpoly_grouped
  ppc_stat_grouped
  ppc_violin_grouped
Note

Bayesian samplers involve many calls to randomisation functions. As a result, the estimates will vary slightly each time the routines are run. You should expect that the outputs that you obtain will differ slightly from those that I am displaying. Nevertheless, the main conclusions should remain robust across subsequent runs.

resid <- resid(dat1a.brm2)[, "Estimate"]
fit <- fitted(dat1a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat1a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat$x))

Conclusions:

  • there does not appear to be any pattern in the residuals

Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.

Density overlay

These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.

dat1a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat1a.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat1a.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat1a.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat1a.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat1a.resids <- make_brms_dharma_res(dat1a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat1a.resids)) +
    wrap_elements(~ plotResiduals(dat1a.resids)) +
    wrap_elements(~ testDispersion(dat1a.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat1b.brm2)[, "Estimate"]
fit <- fitted(dat1b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat1b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat$x))

Conclusions:

  • there does not appear to be any pattern in the residuals

Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.

Density overlay

These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.

dat1b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat1b.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat1b.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat1b.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat1b.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat1b.resids <- make_brms_dharma_res(dat1b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat1b.resids)) +
    wrap_elements(~ plotResiduals(dat1b.resids)) +
    wrap_elements(~ testDispersion(dat1b.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat1c.brm2)[, "Estimate"]
fit <- fitted(dat1c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat1c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat$x))

Conclusions:

  • there does not appear to be any pattern in the residuals

Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.

Density overlay

These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.

dat1c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat1c.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat1c.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat1c.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat1c.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat1c.resids <- make_brms_dharma_res(dat1c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat1c.resids)) +
    wrap_elements(~ plotResiduals(dat1c.resids)) +
    wrap_elements(~ testDispersion(dat1c.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat2a.brm2)[, "Estimate"]
fit <- fitted(dat2a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat2a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat2$x))

Conclusions:

  • there does not appear to be any pattern in the residuals

Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.

Density overlay

These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.

dat2a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat2a.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat2a.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat2a.resids <- make_brms_dharma_res(dat2a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat2a.resids)) +
    wrap_elements(~ plotResiduals(dat2a.resids)) +
    wrap_elements(~ testDispersion(dat2a.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat2b.brm2)[, "Estimate"]
fit <- fitted(dat2b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat2b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat2$x))

Conclusions:

  • there does not appear to be any pattern in the residuals

Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.

Density overlay

These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.

dat2b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat2b.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat2b.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat2b.resids <- make_brms_dharma_res(dat2b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat2b.resids)) +
    wrap_elements(~ plotResiduals(dat2b.resids)) +
    wrap_elements(~ testDispersion(dat2b.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat3a.brm2)[, "Estimate"]
fit <- fitted(dat3a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat3a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat3$x))

conclusions:

  • there does not appear to be any pattern in the residuals

density overlay

dat3a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat3a.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat3a.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat3a.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat3a.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat3a.resids <- make_brms_dharma_res(dat3a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat3a.resids)) +
    wrap_elements(~ plotResiduals(dat3a.resids)) +
    wrap_elements(~ testDispersion(dat3a.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat3b.brm2)[, "Estimate"]
fit <- fitted(dat3b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat3b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat3$x))

conclusions:

  • there does not appear to be any pattern in the residuals

density overlay

dat3b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat3b.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat3b.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat3b.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat3b.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat3b.resids <- make_brms_dharma_res(dat3b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat3b.resids)) +
    wrap_elements(~ plotResiduals(dat3b.resids)) +
    wrap_elements(~ testDispersion(dat3b.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat3c.brm2)[, "Estimate"]
fit <- fitted(dat3c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat3c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat3$x))

conclusions:

  • there does not appear to be any pattern in the residuals

density overlay

dat3c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat3c.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat3c.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat3c.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat3c.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat3c.resids <- make_brms_dharma_res(dat3c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat3c.resids)) +
    wrap_elements(~ plotResiduals(dat3c.resids)) +
    wrap_elements(~ testDispersion(dat3c.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat4a.brm2)[, "Estimate"]
fit <- fitted(dat4a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat4a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat4$x))

conclusions:

  • there does not appear to be any pattern in the residuals

density overlay

dat4a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat4a.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat4a.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat4a.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat4a.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat4a.resids <- make_brms_dharma_res(dat4a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat4a.resids)) +
    wrap_elements(~ plotResiduals(dat4a.resids)) +
    wrap_elements(~ testDispersion(dat4a.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat4b.brm2)[, "Estimate"]
fit <- fitted(dat4b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat4b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat4$x))

conclusions:

  • there does not appear to be any pattern in the residuals

density overlay

dat4b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat4b.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat4b.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat4b.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat4b.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat4b.resids <- make_brms_dharma_res(dat4b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat4b.resids)) +
    wrap_elements(~ plotResiduals(dat4b.resids)) +
    wrap_elements(~ testDispersion(dat4b.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat4c.brm2)[, "Estimate"]
fit <- fitted(dat4c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat4c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat4$x))

conclusions:

  • there does not appear to be any pattern in the residuals

density overlay

dat4c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat4c.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat4c.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat4c.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat4c.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat4c.resids <- make_brms_dharma_res(dat4c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat4c.resids)) +
    wrap_elements(~ plotResiduals(dat4c.resids)) +
    wrap_elements(~ testDispersion(dat4c.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat5a.brm2)[, "Estimate"]
fit <- fitted(dat5a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat5a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat3$x))

conclusions:

  • the above plots are almost impossible to interpret for binary data.
  • they will always feature two curved lines (one for the zeros, the other for the ones)
  • it is virtually impossible to diagnose any issues from such plots.

density overlay

dat5a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data
  • note that these density plots are going to be too crude to be completely useful
  • all the mass should be at either 0 or 1

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat5a.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.
  • this sort of plot is of very little value for binary data

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat5a.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data
  • this sort of plot is of very little value for binary data

Ribbon

These are just an alternative way of expressing the interval plot.

dat5a.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data
  • this sort of plot is of very little value for binary data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat5a.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation

In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data

dat5a.resids <- make_brms_dharma_res(dat5a.brm2, integerResponse = FALSE)
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
wrap_elements(~ testUniformity(dat5a.resids)) +
    wrap_elements(~ plotResiduals(dat5a.resids, quantreg = FALSE)) +
    wrap_elements(~ testDispersion(dat5a.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat5b.brm2)[, "Estimate"]
fit <- fitted(dat5b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat5b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat3$x))

conclusions:

  • the above plots are almost impossible to interpret for binary data.
  • they will always feature two curved lines (one for the zeros, the other for the ones)
  • it is virtually impossible to diagnose any issues from such plots.

density overlay

dat5b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data
  • note that these density plots are going to be too crude to be completely useful
  • all the mass should be at either 0 or 1

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat5b.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.
  • this sort of plot is of very little value for binary data

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat5b.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data
  • this sort of plot is of very little value for binary data

Ribbon

These are just an alternative way of expressing the interval plot.

dat5b.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data
  • this sort of plot is of very little value for binary data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat5b.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation

In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data

dat5b.resids <- make_brms_dharma_res(dat5b.brm2, integerResponse = FALSE)
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
wrap_elements(~ testUniformity(dat5b.resids)) +
    wrap_elements(~ plotResiduals(dat5b.resids, quantreg = FALSE)) +
    wrap_elements(~ testDispersion(dat5b.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat5c.brm2)[, "Estimate"]
fit <- fitted(dat5c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat5c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat3$x))

conclusions:

  • the above plots are almost impossible to interpret for binary data.
  • they will always feature two curved lines (one for the zeros, the other for the ones)
  • it is virtually impossible to diagnose any issues from such plots.

density overlay

dat5c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data
  • note that these density plots are going to be too crude to be completely useful
  • all the mass should be at either 0 or 1

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat5c.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.
  • this sort of plot is of very little value for binary data

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat5c.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data
  • this sort of plot is of very little value for binary data

Ribbon

These are just an alternative way of expressing the interval plot.

dat5c.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data
  • this sort of plot is of very little value for binary data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat5c.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation

In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data

dat5c.resids <- make_brms_dharma_res(dat5c.brm2, integerResponse = FALSE)
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
wrap_elements(~ testUniformity(dat5c.resids)) +
    wrap_elements(~ plotResiduals(dat5c.resids, quantreg = FALSE)) +
    wrap_elements(~ testDispersion(dat5c.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat6a.brm2)[, "Estimate"]
fit <- fitted(dat6a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat6a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat6$x))

conclusions:

  • there does not appear to be any pattern in the residuals

density overlay

dat6a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat6a.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat6a.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat6a.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat6a.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat6a.resids <- make_brms_dharma_res(dat6a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat6a.resids)) +
    wrap_elements(~ plotResiduals(dat6a.resids)) +
    wrap_elements(~ testDispersion(dat6a.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat6b.brm2)[, "Estimate"]
fit <- fitted(dat6b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat6b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat6$x))

conclusions:

  • there does not appear to be any pattern in the residuals

density overlay

dat6b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat6b.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat6b.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat6b.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat6b.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat6b.resids <- make_brms_dharma_res(dat6b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat6b.resids)) +
    wrap_elements(~ plotResiduals(dat6b.resids)) +
    wrap_elements(~ testDispersion(dat6b.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat6c.brm2)[, "Estimate"]
fit <- fitted(dat6c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat6c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat6$x))

conclusions:

  • there does not appear to be any pattern in the residuals

density overlay

dat6c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

  • the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat6c.brm2 |> pp_check(type = 'error_scatter_avg')
Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

  • there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat6c.brm2 |> pp_check(type = 'intervals', x = "x")
Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat6c.brm2 |> pp_check(type = 'ribbon', x = "x")
Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

  • the posterior predictions are not inconsistent with the observed data

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat6c.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
dat6c.resids <- make_brms_dharma_res(dat6c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat6c.resids)) +
    wrap_elements(~ plotResiduals(dat6c.resids)) +
    wrap_elements(~ testDispersion(dat6c.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

  • break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
  • copy the above code into the console and view in a larger graphics device

Conclusions:

  • the Q-Q plot looks reasonable (points broadly follow the angled line)
  • there are no flagged issues with the
    • KS test: conformity to the nominated distribution (family)
    • Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
    • Outlier test: the influence of each observation
  • there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
  • the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

10 Partial effects plots

Prior to exploring the modelled numerical estimates, it is worth reviewing simple plots of the predicted trends associated with each predictor. Importantly, they typically express the trends on the scale of the response, although for some, it is possible to force the trends to be expressed on the link scale. Such plots provides a final visual check of whether the model has yielded sensible outcomes. Furthermore, they usually assist in the interpretation of the major estimated parameters.

dat1a.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat1a.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

dat1b.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat1b.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.

dat1c.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat1c.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.

dat2a.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat2a.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

dat2b.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat2b.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

dat3a.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat3a.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

dat3b.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat3b.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.

dat3c.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat3c.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.

dat4a.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat4a.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

dat4b.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat4b.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.

dat4c.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat4c.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.

dat5a.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat5a.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

dat5b.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat5b.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.

dat5c.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat5c.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.

dat6a.brm2 |>
  conditional_effects(conditions = data.frame(total = dat6$total)) |>
  plot(points = TRUE)

#OR
dat6a.brm2 |>
    conditional_effects(conditions = data.frame(total = dat6$total), 
    spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

dat6b.brm2 |>
  conditional_effects(conditions = data.frame(total = dat6$total)) |>
  plot(points = TRUE)

#OR
dat6b.brm2 |>
    conditional_effects(conditions = data.frame(total = dat6$total), 
    spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.

dat6c.brm2 |>
  conditional_effects(conditions = data.frame(total = dat6$total)) |>
  plot(points = TRUE)

#OR
dat6c.brm2 |>
    conditional_effects(conditions = data.frame(total = dat6$total), 
    spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE) 

Notice that although we had centered and scaled the predictor, because we did so in the model formula, conditional_effects is able to backtransform \(x\) onto the original scale when producing the partial plot.

11 Model investigation

Rather than simply return point estimates of each of the model parameters, Bayesian analyses capture the full posterior of each parameter. These are typically stored within the list structure of the output object.

As with most statistical routines, the overloaded summary() function provides an overall summary of the model parameters. Typically, the summaries will include the means / medians along with credibility intervals and perhaps convergence diagnostics (such as R hat). However, more thorough investigation and analysis of the parameter posteriors requires access to the full posteriors.

There is currently a plethora of functions for extracting the full posteriors from models. In part, this is a reflection of a rapidly evolving space with numerous packages providing near equivalent functionality (it should also be noted, that over time, many of the functions have been deprecated due to inconsistencies in their names). Broadly speaking, the functions focus on draws from the posterior of either the parameters (intercept, slope, standard deviation etc), linear predictor, expected values or predicted values. The distinction between the latter three are highlighted in the following table.

Property Description
linear predictors values predicted on the link scale
expected values predictions (on response scale) without residual error (predicting expected mean outcome(s))
predicted values predictions (on response scale) that incorporate residual error
fitted values predictions on the response scale
dat1a.brm2 |> summary()
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: dat (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.27      0.38    -0.46     1.06 1.00     2134     2352
x            -0.08      0.46    -0.97     0.82 1.00     2459     2330

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.12      0.36     0.66     2.07 1.00     2402     2248

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\), the expected value of \(y\) is 0.266 and we are 95% confident that the true value is between -0.461 and 1.057. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.08 units and we are 95% confident that this change is between -0.967 and 0.816
  • sigma is estimated to be 1.12

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat1a.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 9 × 8
  variable          median     lower   upper  rhat length ess_bulk ess_tail
  <chr>              <dbl>     <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept       0.262   -0.501     1.01   1.00   2400    2134.    2352.
2 b_x              -0.0803  -0.973     0.806  1.00   2400    2459.    2330.
3 sigma             1.03     0.588     1.84   1.00   2400    2402.    2248.
4 Intercept         0.277   -0.443     1.04   1.00   2400    2085.    2348.
5 prior_Intercept  30.9     -0.201    59.8    1.00   2400    2231.    2286.
6 prior_b           0.0169 -13.1      11.0    1.00   2400    1968.    2220.
7 prior_sigma      12.0      0.00537  46.8    1.00   2400    2399.    2330.
8 lprior          -11.2    -11.3     -11.1    1.00   2400    2148.    2103.
9 lp__            -25.1    -28.6     -23.7    1.00   2400    2241.    2233.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 0.262 and we are 95% confident that the true value is between -0.501 and 1.013. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.08 units and we are 95% confident that this change is between -0.973 and 0.806
  • sigma is estimated to be 1.03

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”
dat1a.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 3 × 8
  variable     median  lower upper  rhat length ess_bulk ess_tail
  <chr>         <dbl>  <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept  0.262  -0.501 1.01   1.00   2400    2100.    2344.
2 b_x         -0.0803 -0.973 0.806  1.00   2400    2454.    2322.
3 sigma        1.03    0.588 1.84   1.00   2400    2402.    2236.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 0.262 and we are 95% confident that the true value is between -0.501 and 1.013. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.08 units and we are 95% confident that this change is between -0.973 and 0.806
  • sigma is estimated to be 1.03

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.

dat1a.brm2 |>
    gather_draws(b_Intercept, b_x, sigma) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat1a.brm2 |>
  gather_draws(b_Intercept, b_x, sigma) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat1a.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
           y         ymin     ymax .width .point .interval
1 0.06517101 2.242159e-09 0.313515   0.95 median      hdci

Conclusions:

  • 6.517% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 0% and 31.352%
dat1b.brm2 |> summary()
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ scale(x, scale = FALSE) 
   Data: dat (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              0.28      0.36    -0.40     0.98 1.00     2309     2290
scalexscaleEQFALSE    -0.09      0.46    -0.97     0.85 1.00     2380     2251

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.10      0.34     0.66     1.94 1.00     2363     2409

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\) (its average since it is centered), the expected value of \(y\) is 0.283 and we are 95% confident that the true value is between -0.404 and 0.981. So \(y\) is expected to be 0.283 at the average \(x\).
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.089 units and we are 95% confident that this change is between -0.971 and 0.849
  • sigma is estimated to be 1.1

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat1b.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 9 × 8
  variable               median     lower   upper  rhat length ess_bulk ess_tail
  <chr>                   <dbl>     <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            0.277   -0.437     0.938  1.00   2400    2309.    2290.
2 b_scalexscaleEQFALSE  -0.0841  -0.997     0.803  1.00   2400    2381.    2251.
3 sigma                  1.03     0.596     1.76   1.00   2400    2364.    2409.
4 Intercept              0.277   -0.437     0.938  1.00   2400    2309.    2290.
5 prior_Intercept       31.7      1.66     60.9    1.00   2400    2219.    2150.
6 prior_b               -0.0664 -12.9      13.2    1.00   2400    2408.    2328.
7 prior_sigma           11.2      0.00997  49.3    1.00   2400    2373.    2459.
8 lprior               -11.2    -11.3     -11.1    1.00   2400    2447.    2293.
9 lp__                 -25.0    -28.2     -23.7    1.00   2400    2069.    2035.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\) (its average since it is centered), the expected value of \(y\) is 0.277 and we are 95% confident that the true value is between -0.437 and 0.938. So \(y\) is expected to be 0.283 at the average \(x\).
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.084 units and we are 95% confident that this change is between -0.997 and 0.803
  • sigma is estimated to be 1.03

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”
dat1b.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 3 × 8
  variable              median  lower upper  rhat length ess_bulk ess_tail
  <chr>                  <dbl>  <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept           0.277  -0.437 0.938  1.00   2400    2270.    2282.
2 b_scalexscaleEQFALSE -0.0841 -0.997 0.803  1.00   2400    2371.    2243.
3 sigma                 1.03    0.596 1.76   1.00   2400    2362.    2402.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\) (its average since it is centered), the expected value of \(y\) is 0.277 and we are 95% confident that the true value is between -0.437 and 0.938. So \(y\) is expected to be 0.283 at the average \(x\).
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) -0.084 units and we are 95% confident that this change is between -0.997 and 0.803
  • sigma is estimated to be 1.03

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.

dat1b.brm2 |> get_variables()
 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "sigma"               
 [4] "Intercept"            "prior_Intercept"      "prior_b"             
 [7] "prior_sigma"          "lprior"               "lp__"                
[10] "accept_stat__"        "stepsize__"           "treedepth__"         
[13] "n_leapfrog__"         "divergent__"          "energy__"            
dat1b.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x.*`, `sigma`, regex = TRUE) |> 
  ggplot() +
  geom_histogram(aes(x = .value)) +
  facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat1b.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x.*`, `sigma`, regex = TRUE) |> 
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat1b.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
           y         ymin      ymax .width .point .interval
1 0.06623776 2.645785e-07 0.3103406   0.95 median      hdci

Conclusions:

  • 6.624% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 0% and 31.034%
dat1c.brm2 |> summary()
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ scale(x) 
   Data: dat (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.28      0.37    -0.43     1.02 1.00     2230     1989
scalex       -0.07      0.41    -0.87     0.71 1.00     2201     2292

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.14      0.37     0.65     2.01 1.00     2300     2224

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.284 and we are 95% confident that the true value is between -0.428 and 1.017. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) -0.069 units and we are 95% confident that this change is between -0.87 and 0.712
  • sigma is estimated to be 1.14

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat1c.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 9 × 8
  variable          median       lower   upper  rhat length ess_bulk ess_tail
  <chr>              <dbl>       <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept       0.279   -0.426       1.02  1.00    2400    2230.    1989.
2 b_scalex         -0.0750  -0.813       0.752 1.00    2400    2201.    2292.
3 sigma             1.06     0.550       1.80  0.999   2400    2300.    2224.
4 Intercept         0.279   -0.426       1.02  1.00    2400    2230.    1989.
5 prior_Intercept  31.8      3.60       62.5   1.00    2400    2343.    2023.
6 prior_b          -0.359  -44.6        46.7   1.00    2400    2334.    2368.
7 prior_sigma      11.6      0.0000123  47.0   1.00    2400    2421.    2290.
8 lprior          -12.5    -12.6       -12.4   1.00    2400    2226.    1977.
9 lp__            -26.5    -29.6       -25.0   1.00    2400    2390.    2328.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.279 and we are 95% confident that the true value is between -0.426 and 1.019. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)-0.075 units and we are 95% confident that this change is between -0.813 and 0.752
  • sigma is estimated to be 1.06

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”
dat1c.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 3 × 8
  variable     median  lower upper  rhat length ess_bulk ess_tail
  <chr>         <dbl>  <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept  0.279  -0.426 1.02   1.00   2400    2235.    1985.
2 b_scalex    -0.0750 -0.813 0.752  1.00   2400    2193.    2280.
3 sigma        1.06    0.550 1.80   1.00   2400    2290.    2197.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.279 and we are 95% confident that the true value is between -0.426 and 1.019. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) -0.075 units and we are 95% confident that this change is between -0.813 and 0.752
  • sigma is estimated to be 1.06

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.

dat1c.brm2 |> get_variables()
 [1] "b_Intercept"     "b_scalex"        "sigma"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_sigma"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"       
dat1c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, `sigma`, regex = TRUE) |> 
  ggplot() +
  geom_histogram(aes(x = .value)) +
  facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat1c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, `sigma`, regex = TRUE) |> 
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat1c.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
           y         ymin      ymax .width .point .interval
1 0.06728658 1.499005e-09 0.3197585   0.95 median      hdci

Conclusions:

  • 6.729% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 0% and 31.976%
dat2a.brm2 |> summary()
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: dat2 (Number of observations: 12) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    20.89      2.03    16.62    24.89 1.00     2291     2382
xmedium      -0.83      2.85    -6.31     5.12 1.00     2347     2180
xhigh        -8.72      3.09   -14.74    -2.34 1.00     2321     2345

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     4.48      1.11     2.86     7.18 1.00     2605     2498

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x\) is “control”, the expected value of \(y\) is 20.89 and we are 95% confident that the true value is between 16.616 and 24.891.
  • x*: (the slopes) - the change (effect) in \(y\) between the first (control) group unit (=1) and each other \(x\) level.
    • xmedium: \(y\) is (on average) 0.829 units (95% confident that this change is between -6.312 and 5.116 less in the “medium” group compared to the “control” group.
    • xhigh: \(y\) is (on average) 8.715 units and we are 95% confident that this change is between -14.736 and -2.344 less in the “high” group compared to the “control” group.
  • sigma is estimated to be 4.48

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat2a.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 10 × 8
   variable          median     lower  upper  rhat length ess_bulk ess_tail
   <chr>              <dbl>     <dbl>  <dbl> <dbl>  <dbl>    <dbl>    <dbl>
 1 b_Intercept      20.9     17.0      25.1   1.00   2400    2290.    2382.
 2 b_xmedium        -0.889   -6.44      4.89  1.00   2400    2347.    2180.
 3 b_xhigh          -8.73   -14.8      -2.38  1.00   2400    2320.    2345.
 4 sigma             4.30     2.67      6.71  1.00   2400    2605.    2498.
 5 Intercept        17.7     15.7      20.0   1.00   2400    2485.    2204.
 6 prior_Intercept  19.7     16.1      23.2   1.00   2400    2516.    2283.
 7 prior_b           0.0306 -18.9      20.7   1.00   2400    2384.    2381.
 8 prior_sigma       3.55     0.00499  14.0   1.00   2400    2374.    2369.
 9 lprior          -11.4    -13.1     -10.1   1.00   2400    2361.    2414.
10 lp__            -44.3    -47.8     -42.5   1.00   2400    2231.    2367.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x\) is “control”, the expected value of \(y\) is 20.875 and we are 95% confident that the true value is between 16.993 and 25.064.
  • x*: (the slopes) - the change (effect) in \(y\) between the first (control) group unit (=1) and each other \(x\) level.
    • xmedium: \(y\) is (on average) 0.889 units (95% confident that this change is between -6.445 and 4.886 less in the “medium” group compared to the “control” group.
    • xhigh: \(y\) is (on average) 8.726 units and we are 95% confident that this change is between -14.756 and -2.383 less in the “high” group compared to the “control” group.
  • sigma is estimated to be 4.48

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”
dat2a.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 4 × 8
  variable    median  lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl>  <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept 20.9    17.0  25.1   1.00   2400    2270.    2367.
2 b_xmedium   -0.889  -6.44  4.89  1.00   2400    2329.    2158.
3 b_xhigh     -8.73  -14.8  -2.38  1.00   2400    2294.    2322.
4 sigma        4.30    2.67  6.71  1.00   2400    2597.    2492.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x\) is “control”, the expected value of \(y\) is 20.875 and we are 95% confident that the true value is between 16.993 and 25.064.
  • x*: (the slopes) - the change (effect) in \(y\) between the first (control) group unit (=1) and each other \(x\) level.
    • xmedium: \(y\) is (on average) 0.889 units (95% confident that this change is between -6.445 and 4.886 less in the “medium” group compared to the “control” group.
    • xhigh: \(y\) is (on average) 8.726 units and we are 95% confident that this change is between -14.756 and -2.383 less in the “high” group compared to the “control” group.
  • sigma is estimated to be 4.48

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.

dat2a.brm2 |> get_variables()
 [1] "b_Intercept"     "b_xmedium"       "b_xhigh"         "sigma"          
 [5] "Intercept"       "prior_Intercept" "prior_b"         "prior_sigma"    
 [9] "lprior"          "lp__"            "accept_stat__"   "stepsize__"     
[13] "treedepth__"     "n_leapfrog__"    "divergent__"     "energy__"       
dat2a.brm2 |>
    gather_draws(`b_Intercept`, `b_x.*`, `sigma`, regex = TRUE) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat2a.brm2 |>
  gather_draws(`b_x.*`, regex = TRUE) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat2a.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
          y      ymin      ymax .width .point .interval
1 0.5424268 0.1636153 0.7253359   0.95 median      hdci

Conclusions:

  • 54.243% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 16.362% and 72.534%
dat2b.brm2 |> summary()
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ -1 + x 
   Data: dat2 (Number of observations: 12) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
xcontrol    20.26      2.17    15.79    24.43 1.00     2277     2411
xmedium     18.68      2.15    14.38    23.03 1.00     2406     2232
xhigh       11.10      2.23     6.67    15.83 1.00     2547     2347

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     4.45      1.14     2.82     7.24 1.00     2385     2312

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • x*: (the group means).
    • xcontrol: the expected value of \(y\) in the “control” group is 2.174 (95% credibility interval is between 15.788 and 24.425)
    • xmedium: the expected value of \(y\) in the “control” group is 2.15 (95% credibility interval is between 14.383 and 23.026)
    • xhigh: the expected value of \(y\) in the “control” group is 2.231 (95% credibility interval is between 6.669 and 15.832)
  • sigma is estimated to be 4.45

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat2b.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 8 × 8
  variable    median      lower  upper  rhat length ess_bulk ess_tail
  <chr>        <dbl>      <dbl>  <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_xcontrol   20.3   15.8       24.5  0.999   2400    2278.    2411.
2 b_xmedium    18.6   14.7       23.4  1.00    2400    2406.    2232.
3 b_xhigh      11.0    6.49      15.6  1.00    2400    2547.    2347.
4 sigma         4.26   2.52       6.64 1.00    2400    2385.    2312.
5 prior_b      16.2   -6.13      36.0  0.999   2400    2363.    2497.
6 prior_sigma   3.53   0.000386  13.9  1.00    2400    2463.    2124.
7 lprior      -11.8  -12.8      -11.1  1.00    2400    2174.    2293.
8 lp__        -44.6  -48.3      -42.7  1.00    2400    2347.    2301.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • x*: (the means of each group)
    • xcontrol: the expected value of \(y\) in the “control” group is 20.276 (95% credibility interval is between 15.847 and 24.478)
    • xmedium: the expected value of \(y\) in the “control” group is 18.635 (95% credibility interval is between 14.745 and 23.371)
    • xhigh: the expected value of \(y\) in the “control” group is 11.028 (95% credibility interval is between 6.491 and 15.58)
  • sigma is estimated to be 4.45

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”
dat2b.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 4 × 8
  variable   median lower upper  rhat length ess_bulk ess_tail
  <chr>       <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_xcontrol  20.3  15.8  24.5   1.00   2400    2266.    2404.
2 b_xmedium   18.6  14.7  23.4   1.00   2400    2387.    2190.
3 b_xhigh     11.0   6.49 15.6   1.00   2400    2539.    2305.
4 sigma        4.26  2.52  6.64  1.00   2400    2380.    2290.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • x*: (the means of each group)
    • xcontrol: the expected value of \(y\) in the “control” group is 20.276 (95% credibility interval is between 15.847 and 24.478)
    • xmedium: the expected value of \(y\) in the “control” group is 18.635 (95% credibility interval is between 14.745 and 23.371)
    • xhigh: the expected value of \(y\) in the “control” group is 11.028 (95% credibility interval is between 6.491 and 15.58)
  • sigma is estimated to be 4.45

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.

dat2b.brm2 |> get_variables()
 [1] "b_xcontrol"    "b_xmedium"     "b_xhigh"       "sigma"        
 [5] "prior_b"       "prior_sigma"   "lprior"        "lp__"         
 [9] "accept_stat__" "stepsize__"    "treedepth__"   "n_leapfrog__" 
[13] "divergent__"   "energy__"     
dat2b.brm2 |>
    gather_draws(`b_x.*`, `sigma`, regex = TRUE) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat2b.brm2 |>
  gather_draws(`b_x.*`, regex = TRUE) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat2b.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
          y      ymin      ymax .width .point .interval
1 0.5540797 0.1893543 0.7253041   0.95 median      hdci

Conclusions:

  • 55.408% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 18.935% and 72.53%
dat3a.brm2 |> summary()
 Family: poisson 
  Links: mu = log 
Formula: y ~ x 
   Data: dat3 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.02      0.35    -0.69     0.67 1.00     2108     1861
x             0.35      0.04     0.27     0.43 1.00     2142     2040

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Important

The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\), the expected value of \(y\) is 0.021 and we are 95% confident that the true value is between -0.693 and 0.672. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.345 units and we are 95% confident that this change is between 0.266 and 0.432

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat3a.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 7 × 8
  variable          median   lower   upper  rhat length ess_bulk ess_tail
  <chr>              <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept       0.0368  -0.693   0.672  1.00   2400    2108.    1861.
2 b_x               0.345    0.265   0.430  1.00   2400    2141.    2040.
3 Intercept         1.93     1.63    2.18   1.00   2400    2160.    2129.
4 prior_Intercept   2.01    -0.701   4.48   1.00   2400    1982.    1809.
5 prior_b           0.0274  -6.08    6.64   1.00   2400    2399.    2209.
6 lprior           -2.92    -2.96   -2.91   1.00   2400    2116.    2069.
7 lp__            -26.8    -29.1   -26.1    1.00   2400    2158.    2326.
Important

The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 1.929 and we are 95% confident that the true value is between -0.693 and 0.672. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.345 units and we are 95% confident that this change is between 0.265 and 0.43

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat3a.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
  variable    median lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept   1.04 0.437  1.84  1.00   2400    2103.    1798.
2 b_x           1.41 1.30   1.53  1.00   2400    2136.    1970.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 1.038 and we are 95% confident that the true value is between 0.5 and 1.958. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.411 and we are 95% confident that this change is between 1.303 and 1.538. This represents a (value -1) * 100 41.1 % increase in \(y\) per unit increase in \(x\).

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.

dat3a.brm2 |>
    gather_draws(b_Intercept, b_x) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat3a.brm2 |>
  gather_draws(b_Intercept, b_x) |>
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat3a.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
         y      ymin      ymax .width .point .interval
1 0.922404 0.8093995 0.9343621   0.95 median      hdci

Conclusions:

  • 92.24% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 80.94% and 93.436%
dat3b.brm2 |> summary()
 Family: poisson 
  Links: mu = log 
Formula: y ~ scale(x, scale = FALSE) 
   Data: dat3 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              1.93      0.14     1.65     2.19 1.00     2222     2142
scalexscaleEQFALSE     0.34      0.04     0.26     0.43 1.00     2132     1947

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Important

The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\), the expected value of \(y\) is 1.926 and we are 95% confident that the true value is between 1.649 and 2.191. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.342 units and we are 95% confident that this change is between 0.257 and 0.433

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat3b.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 7 × 8
  variable               median   lower   upper  rhat length ess_bulk ess_tail
  <chr>                   <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            1.93     1.65    2.19   1.00   2400    2222.    2142.
2 b_scalexscaleEQFALSE   0.342    0.263   0.438  1.00   2400    2132.    1947.
3 Intercept              1.93     1.65    2.19   1.00   2400    2222.    2142.
4 prior_Intercept        2.00    -0.452   4.56   1.00   2400    2307.    2368.
5 prior_b               -0.0445  -6.25    5.77   1.00   2400    2196.    2306.
6 lprior                -2.92    -2.95   -2.91   1.00   2400    2217.    2187.
7 lp__                 -26.8    -29.2   -26.1    1.00   2400    2241.    2284.
Important

The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 1.928 and we are 95% confident that the true value is between 1.646 and 2.187. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.342 units and we are 95% confident that this change is between 0.263 and 0.438

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat3b.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
  variable             median lower upper  rhat length ess_bulk ess_tail
  <chr>                 <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            6.87  5.11  8.83  1.00   2400    2211.    2121.
2 b_scalexscaleEQFALSE   1.41  1.29  1.54  1.00   2400    2123.    1929.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 6.873 and we are 95% confident that the true value is between 5.188 and 8.912. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.408 and we are 95% confident that this change is between 1.301 and 1.549. This represents a (value -1) * 100 40.8 % increase in \(y\) per unit increase in \(x\).

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.

dat3b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat3b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat3b.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
          y      ymin      ymax .width .point .interval
1 0.9215583 0.8041622 0.9343709   0.95 median      hdci

Conclusions:

  • 92.156% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 80.416% and 93.437%
dat3c.brm2 |> summary()
 Family: poisson 
  Links: mu = log 
Formula: y ~ scale(x) 
   Data: dat3 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     1.93      0.14     1.64     2.20 1.00     2142     2163
scalex        1.03      0.13     0.78     1.30 1.00     2155     2327

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Important

The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.929 and we are 95% confident that the true value is between 1.643 and 2.195. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) 1.035 units and we are 95% confident that this change is between 0.782 and 1.305

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat3c.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 7 × 8
  variable           median   lower  upper  rhat length ess_bulk ess_tail
  <chr>               <dbl>   <dbl>  <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept       1.93      1.66    2.22  1.00   2400    2141.    2163.
2 b_scalex          1.03      0.768   1.28  1.00   2400    2155.    2327.
3 Intercept         1.93      1.66    2.22  1.00   2400    2141.    2163.
4 prior_Intercept   2.00     -0.830   4.47  1.00   2400    2402.    2329.
5 prior_b           0.00164  -4.98    3.99  1.00   2400    2195.    2105.
6 lprior           -2.86     -3.04   -2.70  1.00   2400    2139.    2328.
7 lp__            -26.8     -28.9   -26.0   1.00   2400    2190.    2273.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.932 and we are 95% confident that the true value is between 1.665 and 2.218. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)1.034 units and we are 95% confident that this change is between 0.768 and 1.279

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat3c.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
  variable    median lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept   6.90  5.05  8.85  1.00   2400    2126.    2156.
2 b_scalex      2.81  2.13  3.55  1.00   2400    2121.    2322.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 6.903 and we are 95% confident that the true value is between 5.285 and 9.185. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) a factor of 2.812 units and we are 95% confident that this change is between 2.156 and 3.593. This represents a ((value -1) * 100) 181.2% increase in \(y\) per unit increase in \(x\).

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.

dat3c.brm2 |> get_variables()
 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat3c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  geom_histogram(aes(x = .value)) +
  facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat3c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat3c.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
          y      ymin      ymax .width .point .interval
1 0.9211877 0.8045365 0.9343754   0.95 median      hdci

Conclusions:

  • 92.119% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 80.454% and 93.438%
dat4a.brm2 |> summary()
 Family: negbinomial 
  Links: mu = log; shape = identity 
Formula: y ~ x 
   Data: dat4 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.36      0.40    -0.43     1.11 1.00     2545     2355
x             0.28      0.05     0.18     0.39 1.00     2474     2236

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape    52.46     64.80     3.26   229.15 1.00     2142     2369

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Important

The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\), the expected value of \(y\) is 0.357 and we are 95% confident that the true value is between -0.434 and 1.11. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.284 units and we are 95% confident that this change is between 0.183 and 0.393

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat4a.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 9 × 8
  variable           median      lower   upper  rhat length ess_bulk ess_tail
  <chr>               <dbl>      <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept      3.66e- 1 -3.65e-  1   1.16  1.00    2400    2545.    2355.
2 b_x              2.83e- 1  1.82e-  1   0.390 1.00    2400    2475.    2236.
3 shape            3.04e+ 1  1.09e+  0 179.    1.00    2400    2143.    2369.
4 Intercept        1.92e+ 0  1.62e+  0   2.25  1.00    2400    2559.    2144.
5 prior_Intercept  1.97e+ 0 -1.75e-  1   3.74  1.00    2400    2326.    2298.
6 prior_b          2.13e- 2 -4.67e+  0   4.70  1.00    2400    2275.    2180.
7 prior_shape      2.15e-28  1.56e-305   0.250 1.00    2400    2329.    2291.
8 lprior          -1.07e+ 1 -1.42e+  1  -8.01  1.00    2400    2137.    2369.
9 lp__            -3.18e+ 1 -3.49e+  1 -30.6   0.999   2400    2105.    2325.
Important

The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 30.415 and we are 95% confident that the true value is between -0.365 and 1.162. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.283 units and we are 95% confident that this change is between 0.182 and 0.39

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat4a.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
  variable    median lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept   1.44 0.568  2.83  1.00   2400    2533.    2330.
2 b_x           1.33 1.20   1.48  1.00   2400    2465.    2218.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 1.442 and we are 95% confident that the true value is between 0.694 and 3.196. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.327 and we are 95% confident that this change is between 1.2 and 1.477. This represents a (value -1) * 100 32.7 % increase in \(y\) per unit increase in \(x\).

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.

dat4a.brm2 |>
    gather_draws(b_Intercept, b_x) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat4a.brm2 |>
  gather_draws(b_Intercept, b_x) |>
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat4a.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
          y      ymin      ymax .width .point .interval
1 0.9003317 0.6621915 0.9214525   0.95 median      hdci

Conclusions:

  • 90.033% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 66.219% and 92.145%
dat4b.brm2 |> summary()
 Family: negbinomial 
  Links: mu = log; shape = identity 
Formula: y ~ scale(x, scale = FALSE) 
   Data: dat4 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              1.92      0.15     1.61     2.21 1.00     2358     2422
scalexscaleEQFALSE     0.28      0.05     0.18     0.40 1.00     2376     2327

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape    49.96     59.36     3.36   203.02 1.00     2163     1994

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Important

The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\), the expected value of \(y\) is 1.918 and we are 95% confident that the true value is between 1.608 and 2.214. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.285 units and we are 95% confident that this change is between 0.18 and 0.396

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat4b.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 9 × 8
  variable                median   lower   upper  rhat length ess_bulk ess_tail
  <chr>                    <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept           1.92e+ 0   1.62    2.22  1.00    2400    2358.    2422.
2 b_scalexscaleEQFALSE  2.84e- 1   0.176   0.391 1.00    2400    2376.    2327.
3 shape                 2.93e+ 1   0.992 164.    0.999   2400    2163.    1994.
4 Intercept             1.92e+ 0   1.62    2.22  1.00    2400    2358.    2422.
5 prior_Intercept       2.06e+ 0   0.190   3.91  1.00    2400    2310.    2313.
6 prior_b               1.04e- 2  -4.52    4.81  1.00    2400    2340.    2167.
7 prior_shape           1.07e-28   0       0.325 1.00    2400    1967.    2130.
8 lprior               -1.05e+ 1 -13.9    -7.90  0.999   2400    2151.    1995.
9 lp__                 -3.17e+ 1 -34.6   -30.5   1.00    2400    2377.    2328.
Important

The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 29.253 and we are 95% confident that the true value is between 1.617 and 2.221. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.284 units and we are 95% confident that this change is between 0.176 and 0.391

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat4b.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
  variable             median lower upper  rhat length ess_bulk ess_tail
  <chr>                 <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            6.84  4.87  8.96  1.00   2400    2338.    2403.
2 b_scalexscaleEQFALSE   1.33  1.19  1.48  1.00   2400    2366.    2320.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 6.839 and we are 95% confident that the true value is between 5.04 and 9.217. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.328 and we are 95% confident that this change is between 1.193 and 1.479. This represents a (value -1) * 100 32.8 % increase in \(y\) per unit increase in \(x\).

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.

dat4b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat4b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat4b.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
          y     ymin      ymax .width .point .interval
1 0.8982169 0.670107 0.9214525   0.95 median      hdci

Conclusions:

  • 89.822% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 67.011% and 92.145%
dat4c.brm2 |> summary()
 Family: negbinomial 
  Links: mu = log; shape = identity 
Formula: y ~ scale(x) 
   Data: dat4 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     1.93      0.16     1.60     2.24 1.00     2358     2298
scalex        0.84      0.16     0.53     1.16 1.00     2470     2238

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape    49.05     56.26     3.31   207.81 1.00     2285     2062

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Important

The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.927 and we are 95% confident that the true value is between 1.601 and 2.238. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) 0.842 units and we are 95% confident that this change is between 0.531 and 1.161

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat4c.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 9 × 8
  variable           median   lower   upper  rhat length ess_bulk ess_tail
  <chr>               <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept      1.93e+ 0   1.58    2.21   1.00   2400    2359.    2298.
2 b_scalex         8.39e- 1   0.521   1.14   1.00   2400    2471.    2238.
3 shape            2.99e+ 1   0.934 161.     1.00   2400    2285.    2062.
4 Intercept        1.93e+ 0   1.58    2.21   1.00   2400    2359.    2298.
5 prior_Intercept  2.07e+ 0   0.280   3.79   1.00   2400    2361.    2244.
6 prior_b         -2.50e- 2  -4.69    4.45   1.00   2400    2327.    2298.
7 prior_shape      6.04e-29   0       0.246  1.00   2400    2451.    2350.
8 lprior          -1.08e+ 1 -13.9    -7.91   1.00   2400    2276.    2123.
9 lp__            -3.18e+ 1 -34.9   -30.7    1.00   2400    1955.    2287.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.927 and we are 95% confident that the true value is between 1.584 and 2.208. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)0.839 units and we are 95% confident that this change is between 0.521 and 1.143

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat4c.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
  variable    median lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept   6.87  4.84  9.05  1.00   2400    2353.    2230.
2 b_scalex      2.31  1.67  3.12  1.00   2400    2466.    2173.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 6.868 and we are 95% confident that the true value is between 4.873 and 9.096. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) a factor of 2.314 units and we are 95% confident that this change is between 1.684 and 3.135. This represents a ((value -1) * 100) 131.4% increase in \(y\) per unit increase in \(x\).

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.

dat4c.brm2 |> get_variables()
 [1] "b_Intercept"     "b_scalex"        "shape"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_shape"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"       
dat4c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  geom_histogram(aes(x = .value)) +
  facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat4c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat4c.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
          y      ymin      ymax .width .point .interval
1 0.8982864 0.6326737 0.9214458   0.95 median      hdci

Conclusions:

  • 89.829% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 63.267% and 92.145%
dat5a.brm2 |> summary()
 Family: binomial 
  Links: mu = logit 
Formula: y | trials(1) ~ x 
   Data: dat5 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -5.60      3.07   -13.48    -0.98 1.00     2231     2230
x             1.11      0.56     0.26     2.50 1.00     2230     2157

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Important

The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\), the expected value of \(y\) is -5.602 and we are 95% confident that the true value is between -13.478 and -0.977. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 1.114 units and we are 95% confident that this change is between 0.256 and 2.499

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat5a.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 7 × 8
  variable         median   lower   upper  rhat length ess_bulk ess_tail
  <chr>             <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept     -5.16   -11.7   -0.0398  1.00   2400    2231.    2230.
2 b_x              1.02     0.180  2.33    1.00   2400    2230.    2157.
3 Intercept        0.518   -0.814  2.00    1.00   2400    2385.    2404.
4 prior_Intercept  0.0137  -1.91   1.96    1.00   2400    2296.    2256.
5 prior_b         -0.0190  -3.17   2.75    1.00   2400    2424.    2410.
6 lprior          -2.84    -4.85  -1.92    1.00   2400    2241.    2190.
7 lp__            -6.01    -8.56  -5.26    1.00   2400    2125.    2339.
Important

The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 0.518 and we are 95% confident that the true value is between -11.682 and -0.04. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 1.02 units and we are 95% confident that this change is between 0.18 and 2.335

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat5a.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
  variable     median         lower upper  rhat length ess_bulk ess_tail
  <chr>         <dbl>         <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept 0.00572 0.00000000196 0.251  1.00   2400    2214.    2206.
2 b_x         2.77    0.959         8.85   1.00   2400    2218.    2145.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 0.006 and we are 95% confident that the true value is between 0 and 0.961. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 2.772 and we are 95% confident that this change is between 1.197 and 10.328. This represents a (value -1) * 100 177.2 % increase in \(y\) per unit increase in \(x\).

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.

dat5a.brm2 |>
    gather_draws(b_Intercept, b_x) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat5a.brm2 |>
  gather_draws(b_Intercept, b_x) |>
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat5a.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
          y      ymin      ymax .width .point .interval
1 0.6241225 0.2563045 0.6997611   0.95 median      hdci

Conclusions:

  • 62.412% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 25.63% and 69.976%
dat5b.brm2 |> summary()
 Family: binomial 
  Links: mu = logit 
Formula: y | trials(1) ~ scale(x, scale = FALSE) 
   Data: dat5 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              0.52      0.73    -0.83     1.97 1.00     2356     2269
scalexscaleEQFALSE     1.13      0.59     0.24     2.57 1.00     2010     1941

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Important

The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\), the expected value of \(y\) is 0.523 and we are 95% confident that the true value is between -0.834 and 1.971. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 1.13 units and we are 95% confident that this change is between 0.242 and 2.575

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat5b.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 7 × 8
  variable                 median  lower upper  rhat length ess_bulk ess_tail
  <chr>                     <dbl>  <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept           0.515     -0.825  1.98 1.00    2400    2356.    2269.
2 b_scalexscaleEQFALSE  1.03       0.150  2.34 1.00    2400    2010.    1941.
3 Intercept             0.515     -0.825  1.98 1.00    2400    2356.    2269.
4 prior_Intercept       0.00929   -1.91   1.95 1.00    2400    2481.    2152.
5 prior_b              -0.0000282 -2.88   3.74 0.999   2400    2327.    2192.
6 lprior               -2.86      -4.78  -1.93 1.00    2400    1983.    2062.
7 lp__                 -6.04      -8.77  -5.26 1.00    2400    2053.    2202.
Important

The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 0.515 and we are 95% confident that the true value is between -0.825 and 1.977. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 1.026 units and we are 95% confident that this change is between 0.15 and 2.337

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat5b.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
  variable             median lower upper  rhat length ess_bulk ess_tail
  <chr>                 <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            1.67 0.226  5.70  1.00   2400    2345.    2253.
2 b_scalexscaleEQFALSE   2.79 0.869  9.24  1.00   2400    1997.    1918.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 1.674 and we are 95% confident that the true value is between 0.438 and 7.22. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 2.789 and we are 95% confident that this change is between 1.162 and 10.351. This represents a (value -1) * 100 178.9 % increase in \(y\) per unit increase in \(x\).

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.

dat5b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat5b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat5b.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
          y      ymin      ymax .width .point .interval
1 0.6238714 0.2277985 0.7055103   0.95 median      hdci

Conclusions:

  • 62.387% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 22.78% and 70.551%
dat5c.brm2 |> summary()
 Family: binomial 
  Links: mu = logit 
Formula: y | trials(1) ~ scale(x) 
   Data: dat5 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.42      0.68    -0.83     1.82 1.00     2297     2232
scalex        2.08      1.28     0.22     5.17 1.00     2331     2289

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Important

The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.424 and we are 95% confident that the true value is between -0.831 and 1.817. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) 2.079 units and we are 95% confident that this change is between 0.215 and 5.17

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat5c.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 7 × 8
  variable         median    lower upper  rhat length ess_bulk ess_tail
  <chr>             <dbl>    <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept      0.409   -0.899   1.73  1.00   2400    2297.    2232.
2 b_scalex         1.84     0.0104  4.69  1.00   2400    2331.    2289.
3 Intercept        0.409   -0.899   1.73  1.00   2400    2297.    2232.
4 prior_Intercept -0.0150  -2.01    1.96  1.00   2400    2303.    2410.
5 prior_b         -0.0234  -3.46    2.82  1.00   2400    2384.    2367.
6 lprior          -3.73    -6.57   -1.92  1.00   2400    2362.    2214.
7 lp__            -7.42   -10.1    -6.56  1.00   2400    2418.    2289.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.409 and we are 95% confident that the true value is between -0.899 and 1.734. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)1.844 units and we are 95% confident that this change is between 0.01 and 4.691

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat5c.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
  variable    median lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept   1.51 0.176  4.75  1.00   2400    2297.    2198.
2 b_scalex      6.32 0.767 94.0   1.00   2400    2312.    2283.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.506 and we are 95% confident that the true value is between 0.407 and 5.663. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) a factor of 6.32 units and we are 95% confident that this change is between 1.01 and 108.987. This represents a ((value -1) * 100) 532% increase in \(y\) per unit increase in \(x\).

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.

dat5c.brm2 |> get_variables()
 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat5c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  geom_histogram(aes(x = .value)) +
  facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat5c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat5c.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
          y       ymin      ymax .width .point .interval
1 0.4896796 0.04655454 0.6721203   0.95 median      hdci

Conclusions:

  • 48.968% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 4.655% and 67.212%
dat6a.brm2 |> summary()
 Family: binomial 
  Links: mu = logit 
Formula: count | trials(total) ~ x 
   Data: dat6 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -3.26      0.94    -5.21    -1.56 1.00     2503     2367
x             0.65      0.17     0.36     1.00 1.00     2470     2347

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Important

The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\), the expected value of \(y\) is -3.262 and we are 95% confident that the true value is between -5.207 and -1.561. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.65 units and we are 95% confident that this change is between 0.355 and 0.997

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat6a.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 7 × 8
  variable           median   lower   upper  rhat length ess_bulk ess_tail
  <chr>               <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept      -3.20     -5.11   -1.52  1.00    2400    2503.    2367.
2 b_x               0.640     0.346   0.982 1.00    2400    2470.    2347.
3 Intercept         0.317    -0.163   0.791 1.00    2400    2453.    2543.
4 prior_Intercept  -0.233    -0.869   0.402 1.00    2400    2390.    2225.
5 prior_b          -0.00697  -1.65    1.70  0.999   2400    2442.    2287.
6 lprior           -2.37     -5.29   -0.417 1.00    2400    2447.    2412.
7 lp__            -14.4     -16.8   -13.7   1.00    2400    2354.    2168.
Important

The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 0.317 and we are 95% confident that the true value is between -5.111 and -1.516. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.64 units and we are 95% confident that this change is between 0.346 and 0.982

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat6a.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
  variable    median    lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl>    <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept 0.0406 0.000717 0.164  1.00   2400    2491.    2362.
2 b_x         1.90   1.37     2.62   1.00   2400    2459.    2329.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 0.041 and we are 95% confident that the true value is between 0.006 and 0.22. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.896 and we are 95% confident that this change is between 1.414 and 2.67. This represents a (value -1) * 100 89.6 % increase in \(y\) per unit increase in \(x\).

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.

dat6a.brm2 |>
    gather_draws(b_Intercept, b_x) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat6a.brm2 |>
  gather_draws(b_Intercept, b_x) |>
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat6a.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
          y      ymin      ymax .width .point .interval
1 0.7603549 0.5651125 0.8234649   0.95 median      hdci

Conclusions:

  • 76.035% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 56.511% and 82.346%
dat6b.brm2 |> summary()
 Family: binomial 
  Links: mu = logit 
Formula: count | trials(total) ~ scale(x, scale = FALSE) 
   Data: dat6 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              0.31      0.25    -0.19     0.80 1.00     2202     2411
scalexscaleEQFALSE     0.66      0.17     0.37     1.02 1.00     2344     2208

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Important

The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\), the expected value of \(y\) is 0.308 and we are 95% confident that the true value is between -0.191 and 0.797. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.659 units and we are 95% confident that this change is between 0.367 and 1.024

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat6b.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 7 × 8
  variable                median   lower   upper  rhat length ess_bulk ess_tail
  <chr>                    <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            0.310    -0.174   0.805  1.00   2400    2202.    2411.
2 b_scalexscaleEQFALSE   0.647     0.366   1.01   1.00   2400    2344.    2208.
3 Intercept              0.310    -0.174   0.805  1.00   2400    2202.    2411.
4 prior_Intercept       -0.218    -0.864   0.403  1.00   2400    2308.    2427.
5 prior_b                0.00307  -1.83    1.59   1.00   2400    2471.    2498.
6 lprior                -2.34     -5.30   -0.580  1.00   2400    2316.    2445.
7 lp__                 -14.4     -16.9   -13.7    1.00   2400    2339.    2368.
Important

The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 0.31 and we are 95% confident that the true value is between -0.174 and 0.805. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) 0.647 units and we are 95% confident that this change is between 0.366 and 1.015

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat6b.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
  variable             median lower upper  rhat length ess_bulk ess_tail
  <chr>                 <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            1.36 0.730  2.07  1.00   2400    2178.    2383.
2 b_scalexscaleEQFALSE   1.91 1.37   2.65  1.00   2400    2336.    2153.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\), the expected value of \(y\) is 1.363 and we are 95% confident that the true value is between 0.84 and 2.236. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). So for every one unit change in \(x\), \(y\) increases by (on average) a factor of 1.91 and we are 95% confident that this change is between 1.442 and 2.759. This represents a (value -1) * 100 91 % increase in \(y\) per unit increase in \(x\).

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing.

dat6b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat6b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat6b.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
          y      ymin      ymax .width .point .interval
1 0.7621002 0.5838024 0.8209675   0.95 median      hdci

Conclusions:

  • 76.21% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 58.38% and 82.097%
dat6c.brm2 |> summary()
 Family: binomial 
  Links: mu = logit 
Formula: count | trials(total) ~ scale(x) 
   Data: dat6 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.28      0.25    -0.19     0.76 1.00     2332     1948
scalex        1.62      0.51     0.70     2.72 1.00     2297     2293

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Important

The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.28 and we are 95% confident that the true value is between -0.189 and 0.765. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) 1.622 units and we are 95% confident that this change is between 0.701 and 2.721

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

  • the median
  • the 95% highest probability density interval (credibility interval)
  • Rhat
  • total number of draws
  • bulk and tail effective sample sizes
dat6c.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)
# A tibble: 7 × 8
  variable           median   lower   upper  rhat length ess_bulk ess_tail
  <chr>               <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept       0.279    -0.189   0.765 1.00    2400    2332.    1948.
2 b_scalex          1.61      0.642   2.59  0.999   2400    2298.    2293.
3 Intercept         0.279    -0.189   0.765 1.00    2400    2332.    1948.
4 prior_Intercept  -0.234    -0.855   0.410 1.00    2400    2359.    2366.
5 prior_b          -0.00711  -0.923   1.01  1.00    2400    2094.    2290.
6 lprior           -5.26     -8.95   -2.08  1.00    2400    2217.    2188.
7 lp__            -17.9     -20.3   -17.1   0.999   2400    2155.    2148.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 0.279 and we are 95% confident that the true value is between -0.189 and 0.765. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average)1.609 units and we are 95% confident that this change is between 0.642 and 2.59

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

  • start with (^) “b_” followed by any amount of (*) any character (.)
  • start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat6c.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )
Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 8
  variable    median lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept   1.32 0.756  2.07  1.00   2400    2318.    1883.
2 b_scalex      5.00 1.47  12.3   1.00   2400    2272.    2260.

Conclusions:

  • in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
  • the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
  • Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
  • Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
  • b_Intercept: when \(x=0\) (its average since it is standardised), the expected value of \(y\) is 1.322 and we are 95% confident that the true value is between 0.828 and 2.149. Since \(x=0\) is outside the observed range of \(x\), this y-intercept is of very limited value.
  • b_x: (the slope) - the rate of change in \(y\) per unit (=1) change in \(x\). Recall that since \(x\) is standardised, 1 unit representes a span of 1 standard deviation of \(x\). So for every one standard deviation unit change in \(x\), \(y\) increases by (on average) a factor of 4.999 units and we are 95% confident that this change is between 1.901 and 13.335. This represents a ((value -1) * 100) 399.9% increase in \(y\) per unit increase in \(x\).

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when \(x\) is centered, so it is more convenient to refer to this parameter via a regular expression.

dat6c.brm2 |> get_variables()
 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"       
dat6c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  geom_histogram(aes(x = .value)) +
  facet_wrap(~.variable, scales = "free")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat6c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat6c.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()
         y      ymin      ymax .width .point .interval
1 0.710588 0.3429293 0.8207865   0.95 median      hdci

Conclusions:

  • 71.059% of the total variability in \(y\) can be explained by its relationship to \(x\)
  • we are 95% confident that the strength of this relationship is between 34.293% and 82.079%

12 Predictions

Whilst linear models are useful for estimating effects (relative differences), because they are low dimensional (only focus on a small number of covariates) they are not good at absolute predictions. Nevertheless, predicting values from linear models provides the basis for investigating/estimating additional effects and generating various graphics to visualise the estimates.

There are a large number of candidate routines for performing prediction. We will go through some of these. It is worth noting that in this context prediction is technically the act of estimating what we expect to get if we were to collect a single new observation from a particular population (e.g. a specific level of fertilizer concentration). Often this is not what we want. Often we want the fitted values - estimates of what we expect to get if we were to collect multiple new observations and average them.

So while fitted values represent the expected underlying processes occurring in the system, predicted values represent our expectations from sampling from such processes.

Package Function Description Summarise with
emmeans emmeans Estimated marginal means from which posteriors can be drawn (via tidy_draws or gather_emmeans_draws()) median_hdci()
rstantools posterior_predict Draw from the posterior of a prediction (includes sigma) - predicts single observations summarise_draws()
rstantools posterior_linpred Draw from the posterior of the fitted values (on the link scale) - predicts average observations summarise_draws()
rstantools posterior_epred Draw from the posterior of the fitted values (on the response scale) - predicts average observations summarise_draws()
tidybayes predicted_draws Extract the posterior of prediction values median_hdci()
tidybayes epred_draws Extract the posterior of expected values median_hdci()
tidybayes fitted_draws Extract the posterior of fitted values median_hdci()
tidybayes add_predicted_draws Adds draws from the posterior of predictions to a data frame (of prediction data) median_hdci()
tidybayes add_fitted_draws Adds draws from the posterior of fitted values to a data frame (of prediction data) median_hdci()

For simple models prediction is essentially taking the model formula complete with parameter (coefficient) estimates and solving for new values of the predictor. To explore this, we will use the fitted model to predict Yield for a Fertilizer concentration of 110.

We will therefore start by establishing this prediction domain as a data frame to use across all of the prediction routines.

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat1a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))
   x  emmean lower.HPD upper.HPD
 2.5  0.0595     -2.59      2.55
 5.0 -0.1437     -4.81      4.67

Point estimate displayed: median 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat1a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5  0.0595  -2.59   2.55   0.95 median hdci     
2   5   -0.144   -4.81   4.67   0.95 median hdci     
dat1a.emmeans <- dat1a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.06
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.144
  • 95% HPD intervals also given
dat1a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0813 -3.54  3.38
2 ...2     -0.138  -5.86  4.89
# Or for even more control and ability to add other summaries
dat1a.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction  0.100 -3.45  3.36
2   5       2 .prediction -0.168 -5.27  5.11
dat1a.pred <- dat1a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.025
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.207
  • 95% HPD intervals also given
dat1a.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0595 -2.59  2.55
2 ...2     -0.144  -4.81  4.67
# Or for even more control and ability to add other summaries
dat1a.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable  median lower upper
  <dbl> <int> <chr>      <dbl> <dbl> <dbl>
1   2.5     1 .epred    0.0595 -2.59  2.55
2   5       2 .epred   -0.144  -4.81  4.67
dat1a.epred <- dat1a.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.06
  • the fitted mean \(y\) associated with an \(x\) of 5 is -0.144
  • 95% HPD intervals also given
dat1a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0595 -2.59  2.55
2 ...2     -0.144  -4.81  4.67
# Or for even more control and ability to add other summaries
dat1a.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable  median lower upper
  <dbl> <int> <chr>      <dbl> <dbl> <dbl>
1   2.5     1 .linpred  0.0595 -2.59  2.55
2   5       2 .linpred -0.144  -4.81  4.67
dat1a.linpred <- dat1a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.06
  • the fitted mean \(y\) associated with an \(x\) of 5 is -0.144
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat1b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))
   x  emmean lower.HPD upper.HPD
 2.5  0.0433     -2.59      2.51
 5.0 -0.1834     -4.86      4.57

Point estimate displayed: median 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat1b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5  0.0433  -2.59   2.51   0.95 median hdci     
2   5   -0.183   -4.86   4.57   0.95 median hdci     
dat1b.emmeans <- dat1b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.043
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.183
  • 95% HPD intervals also given
dat1b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0305 -3.42  3.56
2 ...2     -0.206  -5.36  5.40
# Or for even more control and ability to add other summaries
dat1b.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable     median lower upper
  <dbl> <int> <chr>         <dbl> <dbl> <dbl>
1   2.5     1 .prediction  0.0336 -3.33  3.44
2   5       2 .prediction -0.177  -5.31  5.11
dat1b.pred <- dat1b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.021
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.205
  • 95% HPD intervals also given
dat1b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0433 -2.59  2.51
2 ...2     -0.183  -4.86  4.57
# Or for even more control and ability to add other summaries
dat1b.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable  median lower upper
  <dbl> <int> <chr>      <dbl> <dbl> <dbl>
1   2.5     1 .epred    0.0433 -2.59  2.51
2   5       2 .epred   -0.183  -4.86  4.57
dat1b.epred <- dat1b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.043
  • the fitted mean \(y\) associated with an \(x\) of 5 is -0.183
  • 95% HPD intervals also given
dat1b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    median_hdci()
            y      ymin     ymax .width .point .interval
1 -0.05408044 -4.058454 3.885879   0.95 median      hdci
# Or for even more control and ability to add other summaries
dat1b.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable  median lower upper
  <dbl> <int> <chr>      <dbl> <dbl> <dbl>
1   2.5     1 .linpred  0.0433 -2.59  2.51
2   5       2 .linpred -0.183  -4.86  4.57
dat1b.linpred <- dat1b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    median_hdci()

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is -4.058
  • the fitted mean \(y\) associated with an \(x\) of 5 is NA
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat1c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))
   x emmean lower.HPD upper.HPD
 2.5  0.037     -2.48      2.74
 5.0 -0.181     -5.10      4.80

Point estimate displayed: median 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat1c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5  0.0370  -2.48   2.74   0.95 median hdci     
2   5   -0.181   -5.06   4.85   0.95 median hdci     
dat1c.emmeans <- dat1c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.037
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.181
  • 95% HPD intervals also given
dat1c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0862 -3.68  3.66
2 ...2     -0.197  -6.05  5.08
# Or for even more control and ability to add other summaries
dat1c.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable     median lower upper
  <dbl> <int> <chr>         <dbl> <dbl> <dbl>
1   2.5     1 .prediction  0.0852 -3.74  3.54
2   5       2 .prediction -0.178  -5.26  6.03
dat1c.pred <- dat1c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.096
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.143
  • 95% HPD intervals also given
dat1c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0370 -2.48  2.74
2 ...2     -0.181  -5.10  4.80
# Or for even more control and ability to add other summaries
dat1c.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable  median lower upper
  <dbl> <int> <chr>      <dbl> <dbl> <dbl>
1   2.5     1 .epred    0.0370 -2.48  2.74
2   5       2 .epred   -0.181  -5.10  4.80
dat1c.epred <- dat1c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.037
  • the fitted mean \(y\) associated with an \(x\) of 5 is -0.181
  • 95% HPD intervals also given
dat1c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    median_hdci()
            y     ymin     ymax .width .point .interval
1 -0.04678397 -4.02588 4.194853   0.95 median      hdci
# Or for even more control and ability to add other summaries
dat1c.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable  median lower upper
  <dbl> <int> <chr>      <dbl> <dbl> <dbl>
1   2.5     1 .linpred  0.0370 -2.48  2.74
2   5       2 .linpred -0.181  -5.10  4.80
dat1c.linpred <- dat1c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    median_hdci()

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is -4.026
  • the fitted mean \(y\) associated with an \(x\) of 5 is NA
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is “control”, “medium” and “high”

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat2a.brm2 |> emmeans(~x)
 x       emmean lower.HPD upper.HPD
 control   20.9     16.99      25.1
 medium    20.0     16.08      24.2
 high      12.1      7.95      16.4

Point estimate displayed: median 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat2a.brm2 |>
    emmeans(~x) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 3 × 7
  x       .value .lower .upper .width .point .interval
  <fct>    <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 control   20.9  17.0    25.1   0.95 median hdci     
2 medium    20.0  16.3    24.4   0.95 median hdci     
3 high      12.1   7.95   16.4   0.95 median hdci     
dat2a.emmeans <- dat2a.brm2 |>
    emmeans(~x) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.875
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 20.007
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 12.094
  • 95% HPD intervals also given
dat2a.brm2 |>
  posterior_predict(newdata = data.frame(x =
          c("control", "medium", "high"))) |>
  summarise_draws(median, HDInterval::hdi)
# A tibble: 3 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       20.9 10.3   30.6
2 ...2       19.9 10.3   30.0
3 ...3       11.9  2.17  22.7
# Or for even more control and ability to add other summaries
dat2a.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 3 × 6
# Groups:   x, .row [3]
  x        .row variable    median lower upper
  <chr>   <int> <chr>        <dbl> <dbl> <dbl>
1 control     1 .prediction   20.7 11.1   31.4
2 high        3 .prediction   12.1  2.93  23.4
3 medium      2 .prediction   20.0 10.8   30.2
dat2a.pred <- dat2a.brm2 |>
    posterior_predict(newdata = data.frame(x = c("control", "medium", "high"))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.925
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 19.809
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 12.089
  • 95% HPD intervals also given
dat2a.brm2 |>
    posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 3 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       20.9 17.0   25.1
2 ...2       20.0 16.1   24.2
3 ...3       12.1  7.95  16.4
# Or for even more control and ability to add other summaries
dat2a.brm2 |>
    add_epred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 3 × 6
# Groups:   x, .row [3]
  x        .row variable median lower upper
  <chr>   <int> <chr>     <dbl> <dbl> <dbl>
1 control     1 .epred     20.9 17.0   25.1
2 high        3 .epred     12.1  7.95  16.4
3 medium      2 .epred     20.0 16.1   24.2
dat2a.epred <- dat2a.brm2 |>
    posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.875
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 20.007
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 12.094
  • 95% HPD intervals also given
dat2a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    median_hdci()
         y     ymin     ymax .width .point .interval
1 19.13564 9.279738 24.10563   0.95 median      hdci
# Or for even more control and ability to add other summaries
dat2a.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 3 × 6
# Groups:   x, .row [3]
  x        .row variable median lower upper
  <chr>   <int> <chr>     <dbl> <dbl> <dbl>
1 control     1 .linpred   20.9 17.0   25.1
2 high        3 .linpred   12.1  7.95  16.4
3 medium      2 .linpred   20.0 16.1   24.2
dat2a.linpred <- dat2a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    median_hdci()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 9.28
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is NA
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is NA
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is “control”, “medium” and “high”

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat2b.brm2 |> emmeans(~x)
 x       emmean lower.HPD upper.HPD
 control   20.3     15.85      24.5
 medium    18.6     14.74      23.4
 high      11.0      6.49      15.6

Point estimate displayed: median 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat2b.brm2 |>
    emmeans(~x) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 3 × 7
  x       .value .lower .upper .width .point .interval
  <fct>    <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 control   20.3  15.8    24.5   0.95 median hdci     
2 medium    18.6  14.5    23.1   0.95 median hdci     
3 high      11.0   6.29   15.5   0.95 median hdci     
dat2b.emmeans <- dat2b.brm2 |>
    emmeans(~x) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.276
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 18.635
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 11.028
  • 95% HPD intervals also given
dat2b.brm2 |>
  posterior_predict(newdata = data.frame(x =
          c("control", "medium", "high"))) |>
  summarise_draws(median, HDInterval::hdi)
# A tibble: 3 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       20.1 9.96   29.8
2 ...2       18.8 8.45   29.2
3 ...3       11.1 0.795  21.6
# Or for even more control and ability to add other summaries
dat2b.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 3 × 6
# Groups:   x, .row [3]
  x        .row variable    median lower upper
  <chr>   <int> <chr>        <dbl> <dbl> <dbl>
1 control     1 .prediction   20.5 10.6   30.7
2 high        3 .prediction   11.0  1.68  21.9
3 medium      2 .prediction   18.8  9.04  29.4
dat2b.pred <- dat2b.brm2 |>
    posterior_predict(newdata = data.frame(x = c("control", "medium", "high"))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.363
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 18.851
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 11.024
  • 95% HPD intervals also given
dat2b.brm2 |>
    posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 3 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       20.3 15.8   24.5
2 ...2       18.6 14.7   23.4
3 ...3       11.0  6.49  15.6
# Or for even more control and ability to add other summaries
dat2b.brm2 |>
    add_epred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 3 × 6
# Groups:   x, .row [3]
  x        .row variable median lower upper
  <chr>   <int> <chr>     <dbl> <dbl> <dbl>
1 control     1 .epred     20.3 15.8   24.5
2 high        3 .epred     11.0  6.49  15.6
3 medium      2 .epred     18.6 14.7   23.4
dat2b.epred <- dat2b.brm2 |>
    posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 20.276
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is 18.635
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is 11.028
  • 95% HPD intervals also given
dat2b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    median_hdci()
         y     ymin     ymax .width .point .interval
1 17.98865 8.309788 23.74624   0.95 median      hdci
# Or for even more control and ability to add other summaries
dat2b.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 3 × 6
# Groups:   x, .row [3]
  x        .row variable median lower upper
  <chr>   <int> <chr>     <dbl> <dbl> <dbl>
1 control     1 .linpred   20.3 15.8   24.5
2 high        3 .linpred   11.0  6.49  15.6
3 medium      2 .linpred   18.6 14.7   23.4
dat2b.linpred <- dat2b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    median_hdci()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of “control” is 8.31
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “medium” is NA
  • the predicted (estimated) mean \(y\) associated with an \(x\) of “high” is NA
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat3a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response")
   x rate lower.HPD upper.HPD
 2.5 2.45      1.41      3.73
 5.0 5.79      4.07      7.56

Point estimate displayed: median 
Results are back-transformed from the log scale 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat3a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    mutate(.value = exp(.value)) |> 
    median_hdci()
# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5   2.45   1.41   3.73   0.95 median hdci     
2   5     5.79   4.07   7.56   0.95 median hdci     
# OR with yet more control over the way posteriors are summarised
dat3a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    dplyr::select(-.chain) |> 
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 5
# Groups:   x [2]
      x variable median lower upper
  <dbl> <chr>     <dbl> <dbl> <dbl>
1   2.5 .value    0.894 0.391  1.36
2   5   .value    1.76  1.42   2.04

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2.445
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 5.794
  • 95% HPD intervals also given
dat3a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          2     0     6
2 ...2          5     1    10
# Or for even more control and ability to add other summaries
dat3a.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction    2       0     6
2   5       2 .prediction    5.5     1    10
dat3a.pred <- dat3a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
  • 95% HPD intervals also given
dat3a.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.45  1.41  3.73
2 ...2       5.79  4.07  7.56
# Or for even more control and ability to add other summaries
dat3a.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .epred     2.45  1.41  3.73
2   5       2 .epred     5.79  4.07  7.56

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.445
  • the fitted mean \(y\) associated with an \(x\) of 5 is 5.794
  • 95% HPD intervals also given
dat3a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.45  1.41  3.73
2 ...2       5.79  4.07  7.56
# Or for even more control and ability to add other summaries
dat3a.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = exp(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .linpred   2.45  1.41  3.73
2   5       2 .linpred   5.79  4.07  7.56

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.445
  • the fitted mean \(y\) associated with an \(x\) of 5 is 5.794
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat3b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))
   x emmean lower.HPD upper.HPD
 2.5  0.901     0.377      1.37
 5.0  1.757     1.450      2.05

Point estimate displayed: median 
Results are given on the log (not the response) scale. 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat3b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5  0.901  0.388   1.38   0.95 median hdci     
2   5    1.76   1.45    2.06   0.95 median hdci     
dat3b.emmeans <- dat3b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.901
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1.757
  • 95% HPD intervals also given
dat3b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          2     0     5
2 ...2          6     1    10
# Or for even more control and ability to add other summaries
dat3b.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      2     0     6
2   5       2 .prediction      6     0    10
dat3b.pred <- dat3b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
  • 95% HPD intervals also given
dat3b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.46  1.40  3.84
2 ...2       5.80  4.26  7.79
# Or for even more control and ability to add other summaries
dat3b.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .epred     2.46  1.40  3.84
2   5       2 .epred     5.80  4.26  7.79
dat3b.epred <- dat3b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.463
  • the fitted mean \(y\) associated with an \(x\) of 5 is 5.795
  • 95% HPD intervals also given
dat3b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.46  1.40  3.84
2 ...2       5.80  4.26  7.79
# Or for even more control and ability to add other summaries
dat3b.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = exp(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .linpred   2.46  1.40  3.84
2   5       2 .linpred   5.80  4.26  7.79
dat3b.linpred <- dat3b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.463
  • the fitted mean \(y\) associated with an \(x\) of 5 is 5.795
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat3c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))
   x emmean lower.HPD upper.HPD
 2.5   0.91     0.421      1.40
 5.0   1.76     1.442      2.06

Point estimate displayed: median 
Results are given on the log (not the response) scale. 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat3c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5  0.910  0.421   1.40   0.95 median hdci     
2   5    1.76   1.44    2.06   0.95 median hdci     
dat3c.emmeans <- dat3c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.91
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1.761
  • 95% HPD intervals also given
dat3c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          2     0     6
2 ...2          6     1    10
# Or for even more control and ability to add other summaries
dat3c.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      2     0     6
2   5       2 .prediction      6     1    11
dat3c.pred <- dat3c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
  • 95% HPD intervals also given
dat3c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.48  1.45  3.88
2 ...2       5.82  4.17  7.76
# Or for even more control and ability to add other summaries
dat3c.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .epred     2.48  1.45  3.88
2   5       2 .epred     5.82  4.17  7.76
dat3c.epred <- dat3c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.483
  • the fitted mean \(y\) associated with an \(x\) of 5 is 5.818
  • 95% HPD intervals also given
dat3c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.48  1.45  3.88
2 ...2       5.82  4.17  7.76
# Or for even more control and ability to add other summaries
dat3c.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = exp(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .linpred   2.48  1.45  3.88
2   5       2 .linpred   5.82  4.17  7.76
dat3c.linpred <- dat3c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.483
  • the fitted mean \(y\) associated with an \(x\) of 5 is 5.818
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat4a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response")
   x prob lower.HPD upper.HPD
 2.5 2.93      1.50      4.68
 5.0 5.92      4.02      8.11

Point estimate displayed: median 
Results are back-transformed from the log scale 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat4a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    mutate(.value = exp(.value)) |> 
    median_hdci()
# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5   2.93   1.50   4.68   0.95 median hdci     
2   5     5.92   4.02   8.11   0.95 median hdci     
# OR with yet more control over the way posteriors are summarised
dat4a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    dplyr::select(-.chain) |> 
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 5
# Groups:   x [2]
      x variable median lower upper
  <dbl> <chr>     <dbl> <dbl> <dbl>
1   2.5 .value     1.07 0.563  1.63
2   5   .value     1.78 1.46   2.14

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 2.928
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 5.92
  • 95% HPD intervals also given
dat4a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          3     0     7
2 ...2          6     0    11
# Or for even more control and ability to add other summaries
dat4a.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      3     0     7
2   5       2 .prediction      6     0    11
dat4a.pred <- dat4a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 3
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
  • 95% HPD intervals also given
dat4a.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.93  1.50  4.68
2 ...2       5.92  4.02  8.11
# Or for even more control and ability to add other summaries
dat4a.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .epred     2.93  1.50  4.68
2   5       2 .epred     5.92  4.02  8.11

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.928
  • the fitted mean \(y\) associated with an \(x\) of 5 is 5.92
  • 95% HPD intervals also given
dat4a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.93  1.50  4.68
2 ...2       5.92  4.02  8.11
# Or for even more control and ability to add other summaries
dat4a.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = exp(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .linpred   2.93  1.50  4.68
2   5       2 .linpred   5.92  4.02  8.11

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.928
  • the fitted mean \(y\) associated with an \(x\) of 5 is 5.92
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat4b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))
   x emmean lower.HPD upper.HPD
 2.5   1.07      0.54      1.61
 5.0   1.78      1.44      2.09

Point estimate displayed: median 
Results are given on the log (not the response) scale. 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat4b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5   1.07  0.533   1.60   0.95 median hdci     
2   5     1.78  1.44    2.09   0.95 median hdci     
dat4b.emmeans <- dat4b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 1.068
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1.78
  • 95% HPD intervals also given
dat4b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          3     0     7
2 ...2          6     0    11
# Or for even more control and ability to add other summaries
dat4b.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      3     0     7
2   5       2 .prediction      6     0    11
dat4b.pred <- dat4b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 3
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
  • 95% HPD intervals also given
dat4b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.91  1.50  4.55
2 ...2       5.93  4.08  7.97
# Or for even more control and ability to add other summaries
dat4b.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .epred     2.91  1.50  4.55
2   5       2 .epred     5.93  4.08  7.97
dat4b.epred <- dat4b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.911
  • the fitted mean \(y\) associated with an \(x\) of 5 is 5.929
  • 95% HPD intervals also given
dat4b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.91  1.50  4.55
2 ...2       5.93  4.08  7.97
# Or for even more control and ability to add other summaries
dat4b.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = exp(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .linpred   2.91  1.50  4.55
2   5       2 .linpred   5.93  4.08  7.97
dat4b.linpred <- dat4b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 2.911
  • the fitted mean \(y\) associated with an \(x\) of 5 is 5.929
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat4c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))
   x emmean lower.HPD upper.HPD
 2.5   1.10     0.584      1.61
 5.0   1.79     1.421      2.09

Point estimate displayed: median 
Results are given on the log (not the response) scale. 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat4c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5   1.10  0.587   1.61   0.95 median hdci     
2   5     1.79  1.43    2.11   0.95 median hdci     
dat4c.emmeans <- dat4c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 1.1
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1.79
  • 95% HPD intervals also given
dat4c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          3     0     7
2 ...2          6     1    12
# Or for even more control and ability to add other summaries
dat4c.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      3     0     7
2   5       2 .prediction      6     1    12
dat4c.pred <- dat4c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 3
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 6
  • 95% HPD intervals also given
dat4c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       3.00  1.68  4.83
2 ...2       5.99  4.14  8.11
# Or for even more control and ability to add other summaries
dat4c.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .epred     3.00  1.68  4.83
2   5       2 .epred     5.99  4.14  8.11
dat4c.epred <- dat4c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 3.004
  • the fitted mean \(y\) associated with an \(x\) of 5 is 5.989
  • 95% HPD intervals also given
dat4c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       3.00  1.68  4.83
2 ...2       5.99  4.14  8.11
# Or for even more control and ability to add other summaries
dat4c.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = exp(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .linpred   3.00  1.68  4.83
2   5       2 .linpred   5.99  4.14  8.11
dat4c.linpred <- dat4c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 3.004
  • the fitted mean \(y\) associated with an \(x\) of 5 is 5.989
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat5a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response")
   x   prob lower.HPD upper.HPD
 2.5 0.0704  2.59e-05     0.387
 5.0 0.4975  1.78e-01     0.783

Point estimate displayed: median 
Results are back-transformed from the logit scale 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat5a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    mutate(.value = plogis(.value)) |> 
    median_hdci()
# A tibble: 2 × 7
      x .value    .lower .upper .width .point .interval
  <dbl>  <dbl>     <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5 0.0704 0.0000258  0.386   0.95 median hdci     
2   5   0.498  0.179      0.785   0.95 median hdci     
# OR with yet more control over the way posteriors are summarised
dat5a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    dplyr::select(-.chain) |> 
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 5
# Groups:   x [2]
      x variable   median lower  upper
  <dbl> <chr>       <dbl> <dbl>  <dbl>
1   2.5 .value   -2.58    -6.73 0.0680
2   5   .value   -0.00980 -1.53 1.28  

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.07
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0.498
  • 95% HPD intervals also given
dat5a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          0     0     1
2 ...2          0     0     1
# Or for even more control and ability to add other summaries
dat5a.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      0     0     1
2   5       2 .prediction      0     0     1
dat5a.pred <- dat5a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0
  • 95% HPD intervals also given
dat5a.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median     lower upper
  <chr>     <dbl>     <dbl> <dbl>
1 ...1     0.0704 0.0000258 0.387
2 ...2     0.498  0.178     0.783
# Or for even more control and ability to add other summaries
dat5a.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median     lower upper
  <dbl> <int> <chr>     <dbl>     <dbl> <dbl>
1   2.5     1 .epred   0.0704 0.0000258 0.387
2   5       2 .epred   0.498  0.178     0.783

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.07
  • the fitted mean \(y\) associated with an \(x\) of 5 is 0.498
  • 95% HPD intervals also given
dat5a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median     lower upper
  <chr>     <dbl>     <dbl> <dbl>
1 ...1     0.0704 0.0000258 0.387
2 ...2     0.498  0.178     0.783
# Or for even more control and ability to add other summaries
dat5a.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = plogis(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median     lower upper
  <dbl> <int> <chr>     <dbl>     <dbl> <dbl>
1   2.5     1 .linpred 0.0704 0.0000258 0.387
2   5       2 .linpred 0.498  0.178     0.783

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.07
  • the fitted mean \(y\) associated with an \(x\) of 5 is 0.498
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat5b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))
   x  emmean lower.HPD upper.HPD
 2.5 -2.6167     -6.70     0.244
 5.0 -0.0311     -1.45     1.501

Point estimate displayed: median 
Results are given on the logit (not the response) scale. 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat5b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5 -2.62    -6.53  0.429   0.95 median hdci     
2   5   -0.0311  -1.58  1.41    0.95 median hdci     
dat5b.emmeans <- dat5b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is -2.617
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.031
  • 95% HPD intervals also given
dat5b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          0     0     1
2 ...2          1     0     1
# Or for even more control and ability to add other summaries
dat5b.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      0     0     1
2   5       2 .prediction      0     0     1
dat5b.pred <- dat5b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0
  • 95% HPD intervals also given
dat5b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median      lower upper
  <chr>     <dbl>      <dbl> <dbl>
1 ...1     0.0681 0.00000402 0.396
2 ...2     0.492  0.189      0.818
# Or for even more control and ability to add other summaries
dat5b.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median      lower upper
  <dbl> <int> <chr>     <dbl>      <dbl> <dbl>
1   2.5     1 .epred   0.0681 0.00000402 0.396
2   5       2 .epred   0.492  0.189      0.818
dat5b.epred <- dat5b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.068
  • the fitted mean \(y\) associated with an \(x\) of 5 is 0.492
  • 95% HPD intervals also given
dat5b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median      lower upper
  <chr>     <dbl>      <dbl> <dbl>
1 ...1     0.0681 0.00000402 0.396
2 ...2     0.492  0.189      0.818
# Or for even more control and ability to add other summaries
dat5b.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = plogis(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median      lower upper
  <dbl> <int> <chr>     <dbl>      <dbl> <dbl>
1   2.5     1 .linpred 0.0681 0.00000402 0.396
2   5       2 .linpred 0.492  0.189      0.818
dat5b.linpred <- dat5b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.068
  • the fitted mean \(y\) associated with an \(x\) of 5 is 0.492
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat5c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))
   x  emmean lower.HPD upper.HPD
 2.5 -1.4481     -4.37     0.737
 5.0  0.0908     -1.29     1.378

Point estimate displayed: median 
Results are given on the logit (not the response) scale. 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat5c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5 -1.45    -4.37  0.737   0.95 median hdci     
2   5    0.0908  -1.29  1.38    0.95 median hdci     
dat5c.emmeans <- dat5c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is -1.448
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0.091
  • 95% HPD intervals also given
dat5c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          0     0     1
2 ...2          1     0     1
# Or for even more control and ability to add other summaries
dat5c.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      0     0     1
2   5       2 .prediction      1     0     1
dat5c.pred <- dat5c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1
  • 95% HPD intervals also given
dat5c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median    lower upper
  <chr>     <dbl>    <dbl> <dbl>
1 ...1      0.190 0.000832 0.551
2 ...2      0.523 0.232    0.815
# Or for even more control and ability to add other summaries
dat5c.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median    lower upper
  <dbl> <int> <chr>     <dbl>    <dbl> <dbl>
1   2.5     1 .epred    0.190 0.000832 0.551
2   5       2 .epred    0.523 0.232    0.815
dat5c.epred <- dat5c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.19
  • the fitted mean \(y\) associated with an \(x\) of 5 is 0.523
  • 95% HPD intervals also given
dat5c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median    lower upper
  <chr>     <dbl>    <dbl> <dbl>
1 ...1      0.190 0.000832 0.551
2 ...2      0.523 0.232    0.815
# Or for even more control and ability to add other summaries
dat5c.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = plogis(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median    lower upper
  <dbl> <int> <chr>     <dbl>    <dbl> <dbl>
1   2.5     1 .linpred  0.190 0.000832 0.551
2   5       2 .linpred  0.523 0.232    0.815
dat5c.linpred <- dat5c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.19
  • the fitted mean \(y\) associated with an \(x\) of 5 is 0.523
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat6a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response")
   x  prob lower.HPD upper.HPD
 2.5 0.167    0.0482     0.326
 5.0 0.499    0.3747     0.613

Point estimate displayed: median 
Results are back-transformed from the logit scale 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat6a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    mutate(.value = plogis(.value)) |> 
    median_hdci()
# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5  0.167 0.0482  0.326   0.95 median hdci     
2   5    0.499 0.375   0.613   0.95 median hdci     
# OR with yet more control over the way posteriors are summarised
dat6a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    dplyr::select(-.chain) |> 
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 5
# Groups:   x [2]
      x variable   median  lower  upper
  <dbl> <chr>       <dbl>  <dbl>  <dbl>
1   2.5 .value   -1.61    -2.76  -0.659
2   5   .value   -0.00525 -0.512  0.459

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0.167
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0.499
  • 95% HPD intervals also given
dat6a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          0     0     1
2 ...2          1     0     1
# Or for even more control and ability to add other summaries
dat6a.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable    median lower upper
  <dbl> <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1     1 .prediction      0     0     1
2   5       1     2 .prediction      1     0     1
dat6a.pred <- dat6a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0
  • 95% HPD intervals also given
dat6a.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median  lower upper
  <chr>     <dbl>  <dbl> <dbl>
1 ...1      0.167 0.0482 0.326
2 ...2      0.499 0.375  0.613
# Or for even more control and ability to add other summaries
dat6a.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable median  lower upper
  <dbl> <dbl> <int> <chr>     <dbl>  <dbl> <dbl>
1   2.5     1     1 .epred    0.167 0.0482 0.326
2   5       1     2 .epred    0.499 0.375  0.613

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.167
  • the fitted mean \(y\) associated with an \(x\) of 5 is 0.499
  • 95% HPD intervals also given
dat6a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median  lower upper
  <chr>     <dbl>  <dbl> <dbl>
1 ...1      0.167 0.0482 0.326
2 ...2      0.499 0.375  0.613
# Or for even more control and ability to add other summaries
dat6a.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    mutate(.linpred = plogis(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable median  lower upper
  <dbl> <dbl> <int> <chr>     <dbl>  <dbl> <dbl>
1   2.5     1     1 .linpred  0.167 0.0482 0.326
2   5       1     2 .linpred  0.499 0.375  0.613

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.167
  • the fitted mean \(y\) associated with an \(x\) of 5 is 0.499
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat6b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))
   x  emmean lower.HPD upper.HPD
 2.5 -1.6495    -2.746    -0.650
 5.0 -0.0187    -0.521     0.481

Point estimate displayed: median 
Results are given on the logit (not the response) scale. 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat6b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5 -1.65   -2.75  -0.650   0.95 median hdci     
2   5   -0.0187 -0.555  0.452   0.95 median hdci     
dat6b.emmeans <- dat6b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is -1.65
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is -0.019
  • 95% HPD intervals also given
dat6b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          0     0     1
2 ...2          0     0     1
# Or for even more control and ability to add other summaries
dat6b.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable    median lower upper
  <dbl> <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1     1 .prediction      0     0     1
2   5       1     2 .prediction      0     0     1
dat6b.pred <- dat6b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0
  • 95% HPD intervals also given
dat6b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median  lower upper
  <chr>     <dbl>  <dbl> <dbl>
1 ...1      0.161 0.0480 0.315
2 ...2      0.495 0.373  0.618
# Or for even more control and ability to add other summaries
dat6b.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable median  lower upper
  <dbl> <dbl> <int> <chr>     <dbl>  <dbl> <dbl>
1   2.5     1     1 .epred    0.161 0.0480 0.315
2   5       1     2 .epred    0.495 0.373  0.618
dat6b.epred <- dat6b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.161
  • the fitted mean \(y\) associated with an \(x\) of 5 is 0.495
  • 95% HPD intervals also given
dat6b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median  lower upper
  <chr>     <dbl>  <dbl> <dbl>
1 ...1      0.161 0.0480 0.315
2 ...2      0.495 0.373  0.618
# Or for even more control and ability to add other summaries
dat6b.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    mutate(.linpred = plogis(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable median  lower upper
  <dbl> <dbl> <int> <chr>     <dbl>  <dbl> <dbl>
1   2.5     1     1 .linpred  0.161 0.0480 0.315
2   5       1     2 .linpred  0.495 0.373  0.618
dat6b.linpred <- dat6b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.161
  • the fitted mean \(y\) associated with an \(x\) of 5 is 0.495
  • 95% HPD intervals also given

In each case, we will predict \(y\) when \(x\) is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat6c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))
   x  emmean lower.HPD upper.HPD
 2.5 -1.3104    -2.459    -0.322
 5.0  0.0105    -0.485     0.502

Point estimate displayed: median 
Results are given on the logit (not the response) scale. 
HPD interval probability: 0.95 
# OR with more control over the way posteriors are summarised
dat6c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()
# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5 -1.31   -2.43  -0.279   0.95 median hdci     
2   5    0.0105 -0.485  0.502   0.95 median hdci     
dat6c.emmeans <- dat6c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is -1.31
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 0.011
  • 95% HPD intervals also given
dat6c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          0     0     1
2 ...2          1     0     1
# Or for even more control and ability to add other summaries
dat6c.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable    median lower upper
  <dbl> <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1     1 .prediction      0     0     1
2   5       1     2 .prediction      1     0     1
dat6c.pred <- dat6c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the predicted (estimated) mean \(y\) associated with an \(x\) of 2.5 is 0
  • the predicted (estimated) mean \(y\) associated with an \(x\) of 5 is 1
  • 95% HPD intervals also given
dat6c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median  lower upper
  <chr>     <dbl>  <dbl> <dbl>
1 ...1      0.212 0.0560 0.388
2 ...2      0.503 0.381  0.623
# Or for even more control and ability to add other summaries
dat6c.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable median  lower upper
  <dbl> <dbl> <int> <chr>     <dbl>  <dbl> <dbl>
1   2.5     1     1 .epred    0.212 0.0560 0.388
2   5       1     2 .epred    0.503 0.381  0.623
dat6c.epred <- dat6c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.212
  • the fitted mean \(y\) associated with an \(x\) of 5 is 0.503
  • 95% HPD intervals also given
dat6c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)
# A tibble: 2 × 4
  variable median  lower upper
  <chr>     <dbl>  <dbl> <dbl>
1 ...1      0.212 0.0560 0.388
2 ...2      0.503 0.381  0.623
# Or for even more control and ability to add other summaries
dat6c.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    mutate(.linpred = plogis(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )
# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable median  lower upper
  <dbl> <dbl> <int> <chr>     <dbl>  <dbl> <dbl>
1   2.5     1     1 .linpred  0.212 0.0560 0.388
2   5       1     2 .linpred  0.503 0.381  0.623
dat6c.linpred <- dat6c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

  • the fitted mean \(y\) associated with an \(x\) of 2.5 is 0.212
  • the fitted mean \(y\) associated with an \(x\) of 5 is 0.503
  • 95% HPD intervals also given

13 Further investigations

Since we have the entire posterior, we are able to make probability statements. We simply count up the number of MCMC sample draws that satisfy a condition (e.g represent a slope greater than 0) and then divide by the total number of MCMC samples.

Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:

  1. a change in \(x\) is associated with an increase in \(y\)
  2. a doubling of \(x\) (from 2.5 to 5) is associated with an increase in \(y\) of > 50%

The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0).

The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is “practically equivalent” to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. Kruschke (2018) argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on Cohen (1988). Kruschke (2018) also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of “null-hypothesis” testing in a Bayesian framework.

  • if the HDI of the focal parameter falls completely outside the ROPE, there is strong evidence that there is an effect
  • if the HDI of the focal parameter falls completely inside the ROPE, there is strong evidence that there is not an effect
  • otherwise there is not clear evidence either way
Note

ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a “non-significant” result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart.

I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights.

## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat1a.brm2)

dat1a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI")
# Test for Practical Equivalence

  ROPE: [-0.09 0.09]

Parameter |        H0 | inside ROPE |      95% HDI
--------------------------------------------------
x         | Undecided |     15.88 % | [-0.97 0.82]
dat1a.brm2 |>
    bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") |>
    plot()
Picking joint bandwidth of 0.0734

Conclusions:

  • the percentage of the HPD for the slope that is inside the ROPE is 0.159
  • there is strong evidence for an effect

OR using the rope function.

## Calculate ROPE range manually
dat1a.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")
# Proportion of samples inside the ROPE [-0.09, 0.09]:

Parameter | inside ROPE
-----------------------
x         |     15.87 %
dat1a.brm2 |>
    bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") |>
    plot()

The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect)

Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:

  1. a change in \(x\) is associated with an increase in \(y\)
  2. a doubling of \(x\) (from 2.5 to 5) is associated with an increase in \(y\) of > 50%

The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0).

The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is “practically equivalent” to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. Kruschke (2018) argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on Cohen (1988). Kruschke (2018) also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of “null-hypothesis” testing in a Bayesian framework.

  • if the HDI of the focal parameter falls completely outside the ROPE, there is strong evidence that there is an effect
  • if the HDI of the focal parameter falls completely inside the ROPE, there is strong evidence that there is not an effect
  • otherwise there is not clear evidence either way
Note

ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a “non-significant” result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart.

I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights.

## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat1b.brm2)

dat1b.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI")
# Test for Practical Equivalence

  ROPE: [-0.09 0.09]

Parameter          |        H0 | inside ROPE |      95% HDI
-----------------------------------------------------------
scalexscaleEQFALSE | Undecided |     18.33 % | [-0.97 0.85]
dat1b.brm2 |>
    bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") |>
    plot()
Picking joint bandwidth of 0.0727

Conclusions:

  • the percentage of the HPD for the slope that is inside the ROPE is 0.183
  • there is strong evidence for an effect

OR using the rope function.

## Calculate ROPE range manually
dat1b.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")
# Proportion of samples inside the ROPE [-0.09, 0.09]:

Parameter          | inside ROPE
--------------------------------
scalexscaleEQFALSE |     18.33 %
dat1b.brm2 |>
    bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") |>
    plot()

The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect)

Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:

  1. a change in \(x\) is associated with an increase in \(y\)
  2. a doubling of \(x\) (from 2.5 to 5) is associated with an increase in \(y\) of > 50%

The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0).

The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is “practically equivalent” to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. Kruschke (2018) argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on Cohen (1988). Kruschke (2018) also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of “null-hypothesis” testing in a Bayesian framework.

  • if the HDI of the focal parameter falls completely outside the ROPE, there is strong evidence that there is an effect
  • if the HDI of the focal parameter falls completely inside the ROPE, there is strong evidence that there is not an effect
  • otherwise there is not clear evidence either way
Note

ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a “non-significant” result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart.

I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights.

## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat1c.brm2)

dat1c.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI")
# Test for Practical Equivalence

  ROPE: [-0.09 0.09]

Parameter |        H0 | inside ROPE |      95% HDI
--------------------------------------------------
scalex    | Undecided |     19.43 % | [-0.87 0.71]
dat1c.brm2 |>
    bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") |>
    plot()
Picking joint bandwidth of 0.0637

Conclusions:

  • the percentage of the HPD for the slope that is inside the ROPE is 0.194
  • there is strong evidence for an effect

OR using the rope function.

## Calculate ROPE range manually
dat1c.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")
# Proportion of samples inside the ROPE [-0.09, 0.09]:

Parameter | inside ROPE
-----------------------
scalex    |     19.42 %
dat1c.brm2 |>
    bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") |>
    plot()

The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect)

Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:

  1. all pairwise comparisons (compare each level of \(x\) to each other
  2. define a specific set of contrasts that include comparing the average of medium and high treatments to the control treatment.

We have already seen that there is no evidence of a difference in \(y\) between the “control” and “medium” groups. This could be because either there is not enough power to detect the difference or that the populations are not different. It would be nice to be able to gain some insights into which of these is most likely. And we can. If we establish the range of values that represent an insubstantial effect, we can then quantify the proportion of the posterior that falls inside this Region of Practical Equivalence (ROPE).

Conventionally, the ROPE represents within 10% - that is, if the effect is less than 10% change, then we might consider it insubstantial.

## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat2, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat2a.brm2)

dat2a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI")
# Test for Practical Equivalence

  ROPE: [-0.60 0.60]

Parameter |        H0 | inside ROPE |        95% HDI
----------------------------------------------------
xmedium   | Undecided |     17.50 % | [ -6.31  5.12]
xhigh     |  Rejected |      0.00 % | [-14.74 -2.34]
dat2a.brm2 |>
    bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") |>
    plot()
Picking joint bandwidth of 0.484

Conclusions:

  • there is insufficient evidence to conclude that there is a difference in \(y\) between “control” and “medium” groups
  • we cannot conclude that there is evidence of no effect
## Calculate ROPE range manually
dat2a.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")
# Proportion of samples inside the ROPE [-0.60, 0.60]:

Parameter | inside ROPE
-----------------------
xmedium   |     17.49 %
xhigh     |      0.00 %
dat2a.brm2 |>
    bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") |>
    plot()

14 Summary plots

dat1a.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1a.brm2 |>
    emmeans(~x, at = dat1a.grid) |>
    as.data.frame() |>
    ggplot(aes(y = emmean, x = x)) +
    geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) +
    geom_point(data = dat, aes(y = y)) +
    geom_line() +
    theme_classic()

As a spaghetti plot

dat1a.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1a.brm2 |>
    emmeans(~x, at = dat1a.grid) |>
    gather_emmeans_draws() |> 
    ggplot(aes(y = .value, x = x)) +
    geom_line(aes(group=.draw),  colour = 'orange', alpha=0.01) +
    geom_point(data = dat, aes(y = y)) +
    theme_classic()

dat1b.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1b.brm2 |>
    emmeans(~x, at = dat1b.grid) |>
    as.data.frame() |>
    ggplot(aes(y = emmean, x = x)) +
    geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) +
    geom_point(data = dat, aes(y = y)) +
    geom_line() +
    theme_classic()

As a spaghetti plot

dat1b.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1b.brm2 |>
    emmeans(~x, at = dat1b.grid) |>
    gather_emmeans_draws() |> 
    ggplot(aes(y = .value, x = x)) +
    geom_line(aes(group=.draw),  colour = 'orange', alpha=0.01) +
    geom_point(data = dat, aes(y = y)) +
    theme_classic()

dat1c.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1c.brm2 |>
    emmeans(~x, at = dat1c.grid) |>
    as.data.frame() |>
    ggplot(aes(y = emmean, x = x)) +
    geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) +
    geom_point(data = dat, aes(y = y)) +
    geom_line() +
    theme_classic()

As a spaghetti plot

dat1c.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1c.brm2 |>
    emmeans(~x, at = dat1c.grid) |>
    gather_emmeans_draws() |> 
    ggplot(aes(y = .value, x = x)) +
    geom_line(aes(group=.draw),  colour = 'orange', alpha=0.01) +
    geom_point(data = dat, aes(y = y)) +
    theme_classic()

dat2a.brm2 |>
  emmeans(~x) |>
  as.data.frame() |>
  ggplot(aes(y = emmean, x = x)) +
  geom_pointrange(aes(ymin = lower.HPD, ymax = upper.HPD)) +
  theme_classic()

15 References

Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates.
Kruschke, John K. 2018. “Rejecting or Accepting Parameter Values in Bayesian Estimation.” Advances in Methods and Practices in Psychological Science 1 (2): 270–80. https://doi.org/10.1177/2515245918771304.
Wade, Paul R. 2000. “Bayesian Methods in Conservation Biology.” Conservation Biology 14 (5): 1308–16. https://doi.org/https://doi.org/10.1046/j.1523-1739.2000.99415.x.