Bayesian generalised linear models

Author

Murray Logan

Published

June 14, 2025

1 Preparations

Load the necessary libraries

library(tidyverse)   #for data wrangling and plotting
library(DHARMa)      #for simulated residuals
library(performance) #for model diagnostics
library(see)         #for model diagnostics
library(brms)        #for Bayesian models
library(tidybayes)   #for exploring Bayesian PKPDmodels
library(rstan)       #for diagnostics plots
library(bayesplot)   #for diagnostic plots
library(patchwork)   #for arranging multiple plots
library(gridGraphics)#for arranging multiple plots - needed for some patchwork plots
library(HDInterval)  #for HPD intervals
library(bayestestR)  #for ROPE
library(emmeans)     #for estimated marginal means
library(standist)    #for plotting distributions
library(cmdstanr)    #for the backend
source("helperfunctions.R")

Many biologists and ecologists get a little twitchy and nervous around mathematical and statistical formulae and nomenclature. Whilst it is possible to perform basic statistics without too much regard for the actual equation (model) being employed, as the complexity of the analysis increases, the need to understand the underlying model becomes increasingly important. Moreover, model specification in BUGS (the language used to program Bayesian modelling) aligns very closely to the underlying formulae. Hence a good understanding of the underlying model is vital to be able to create a sensible Bayesian model. Consequently, I will always present the linear model formulae along with the analysis. If you start to feel some form of disorder starting to develop, you might like to run through the Tutorials and Workshops twice (the first time ignoring the formulae).

Note

This tutorial will introduce the concept of Bayesian (generalised) linear models and demonstrate how to fit simple models to a set of simple fabricated data sets, each representing major data types encountered in ecological research. Subsequent tutorials will build on these fundamentals with increasingly more complex data and models.

2 A philosophical note

To introduce the philosophical and mathematical differences between classical (frequentist) and Bayesian statistics, Wade (2000) presented a provocative yet compelling trend analysis of two hypothetical populations. The temporal trend of one of the populations shows very little variability from a very subtle linear decline. By contrast, the second population appears to decline more dramatically, yet has substantially more variability.

Wade (2000) neatly illustrates the contrasting conclusions (particularly with respect to interpreting probability) that would be drawn by the frequentist and Bayesian approaches and in so doing highlights how and why the Bayesian approach provides outcomes that are more aligned with management requirements.

This tutorial will start by replicating the demonstration of Wade (2000).

n: 10
Slope: -0.1022
t: -2.3252
p-value: 0.0485

n: 10
Slope: -10.2318
t: -2.2115
p-value: 0.0579

n: 100
Slope: -10.4713
t: -6.6457
p-value: 0

From a traditional frequentist perspective, we would conclude that there is a ‘significant’ relationship in Population A and C ($p<0.05$), yet not in Population B ($p>0.05$). Note, Population B and C were both generated from the same random distribution, it is just that Population C has a substantially higher number of observations.

The above illustrates a couple of things

statistical significance does not necessarily translate into biological importance. The percentage of decline for Population A is 0.46 where as the percentage of decline for Population B is 45.26. That is Population B is declining at nearly 10 times the rate of Population A. That sounds rather important, yet on the basis of the hypothesis test, we would dismiss the decline in Population B.
that a p-value is just the probability of detecting an effect or relationship - what is the probability that the sample size is large enough to pick up a difference.

Let us now look at it from a Bayesian perspective. I will just provide the posterior distributions (densities scaled to 0-1 so that they can be plotted together) for the slope for each population.

Focusing on Populations A and B, we would conclude:

the mean (plus or minus CI) slopes for Population A and B are -0.1 (-0.22,0.01) and -10.32 (-20.79,0.11) respectively.
the Bayesian approach allows us to query the posterior distribution is many other ways in order to ask sensible biological questions. For example, we might consider that a rate of change of 5% or greater represents an important biological impact. For Population A and B, the probability that the rate is 5% or greater is 0 and 0.86 respectively.

3 Review of (generalised) linear models

I would highly recommend reviewing the information in the tutorial on generalised linear models, particularly the sections describe linear models, assumption checking and generalised linear models (GLM). Whilst there are philosophical differences between frequentist and Bayesian statistics that have implications for how models are fit and interpreted, model choice and assumption checking principles are common between the two approaches. Hence, many of these topics will be assumed, and not fully described in the current tutorial.

Recall from the tutorial on generalised linear models that simple linear regression is a linear modelling process that models a single response variable against one or more predictors with a linear combination of coefficients and (in the case of a Gaussian model) can be expressed as:

\[y_i = \beta_0+ \beta_1 x_i+\epsilon_i \hspace{1cm}\epsilon\sim{}N(0,\sigma^2)\]

where:

$y_i$ is the response value for each of the $i$ observations
$\beta_0$ is the y-intercept (value of $y$ when $x=0$)
$\beta_1$ is the slope (rate of chance in $y$ per unit chance in $x$)
$x_i$ is the predictor value for each of the $i$ observations
$\epsilon_i$ is the residual value of each of the $i$ observations. A residual is the difference between the observed value and the value expected by the model.
$\epsilon\sim{}N(0,\sigma^2)$ indicates that the residuals are normally distributed with a constant amount of variance

The above can be re-expressed and generalised as:

\[ \begin{align} y_i&\sim{}Dist(\mu_i, ...) \\ g(\mu_i) &= \beta_0+ \beta_1 x_i \end{align} \]

where:

$Dist$ represents a distribution from the exponential family (such as Gaussian, Poisson, Binomial, etc)
$...$ represents additional parameters relevant to the nominated distribution (such as $\sigma^2$: Gaussian, $n$: Binomial and $\phi$: Negative Binomial, etc)
$g()$ represents the link function (e.g. log: Poisson, logit: Binomial, etc)

The reliability of any model depends on the degree to which the data adheres to the model assumptions. Hence, as with frequentist models, exploratory data analysis (EDA) is a vital component of Bayesian modelling and since the model structures are similar between frequentist and Bayesian approaches, so too is EDA.

4 Bayesian (generalised) linear models

For the purpose of introduction, we will start by exploring a Gaussian model with a very simple fabricated data set representing the relationship between a response ($y$) and a continuous predictor ($x = [1,2,3,4,5,6,7,8,9,10]$. The fabricated data set will comprise 10 observations each drawn from normal distributions with a set standard deviation of 4. The means of the 10 populations will be determined by the following equation:

\[ \mu_i = 2 + 5\times x_i \]

Let generate these data.

set.seed(234)
dat <- data.frame(x = 1:10) |>
    mutate(y = round(rnorm(n = 10, mean = 2 + (5 * x), sd = 4), digits = 2))
dat

    x     y
1   1  9.64
2   2  3.79
3   3 11.00
4   4 27.88
5   5 32.84
6   6 32.56
7   7 37.84
8   8 29.86
9   9 45.05
10 10 47.65

The model will will be fitting will be:

\[ \begin{align} y_i&\sim{}N(\mu_i, \sigma^2)\\ \mu_i &= \beta_0+ \beta_1 x_i \end{align} \]

The parameters that we are going to attempt to estimate are the y-intercept ($\beta_0$), the slope ($\beta_1$) and the underlying variance ($\sigma^2$). Recall (from tutorials on statistical philosophies and estimation)) that Bayesian models calculate posterior predictions ($P(H|D)$) from likelihood ($P(D|H)$) and prior expectations ($P(H)$). Therefore, in preparation for fitting a Bayesian model, we must consider what our prior expectations are for all parameters.

The individual responses ($y_i$, observed yields) are each expected to have been independently drawn from normal (Gaussian) distributions ($\mathcal{N}$). These distributions represent all the possible values of $y$ we could have obtained at the specific ($i^{th}$) level of $x$. Hence the $i^{th}$ $y$ observation is expected to have been drawn from a normal distribution with a mean of $\mu_i$.

Although each distribution is expected to come from populations that differ in their means, we assume that all of these distributions have the same variance ($\sigma^2$).

4.1 Priors

We need to supply priors for each of the parameters to be estimated ($\beta_0$, $\beta_1$ and $\sigma$). Whilst we want these priors to be sufficiently vague as to not influence the outcomes of the analysis (and thus be equivalent to the frequentist analysis), we do not want the priors to be so vague (wide) that they permit the MCMC sampler to drift off into parameter space that is both illogical as well as numerically awkward.

Proffering sensible priors is one of the most difficult aspects of performing Bayesian analyses. For instances where there are some previous knowledge available and a desire to incorporate those data, the difficulty is in how to ensure that the information is incorporated correctly. However, for instances where there are no previous relevant information and so a desire to have the posteriors driven entirely by the new data, the difficulty is in how to define priors that are both vague enough (not bias results in their direction) and yet not so vague as to allow the MCMC sampler to drift off into unsupported regions (and thus get stuck and yield spurious estimates).

For early implementations of MCMC sampling routines (such as Metropolis Hasting and Gibbs), it was fairly common to see very vague priors being defined. For example, the priors on effects, were typically normal priors with mean of 0 and variance of 1e+06 (1,00,000). These are very vague priors. Yet for some samplers (e.g. NUTS), such vague priors can encourage poor behaviour of the sampler - particularly if the posterior is complex. It is now generally advised that priors should (where possible) be somewhat weakly informative and to some extent, represent the bounds of what are feasible and sensible estimates.

The degree to which priors influence an outcome (whether by having a pulling effect on the estimates or by encouraging the sampler to drift off into unsupported regions of the posterior) is dependent on:

the relative sparsity of the data - the larger the data, the less weight the priors have and thus less influence they exert.
the complexity of the model (and thus posterior) - the more parameters, the more sensitive the sampler is to the priors.

The sampled posterior is the product of both the likelihood and the prior - all of which are multidimensional. For most applications, it would be vertically impossible to define a sensible multidimensional prior. Hence, our only option is to define priors on individual parameters (e.g. the intercept, slope(s), variance etc) and to hope that if they are individually sensible, they will remain collectively sensible.

So having (hopefully) impressed upon the notion that priors are an important consideration, I will now attempt to synthesise some of the approaches that can be employed to arrive at weakly informative priors that have been gleaned from various sources. Largely, this advice has come from the following resources:

https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations
http://svmiller.com/blog/2021/02/thinking-about-your-priors-bayesian-analysis/

I will outline some of the current main recommendations before summarising some approaches in a table.

weakly informative priors should contain enough information so as to regularise (discourage unreasonable parameter estimates whilst allowing all reasonable estimates).
for effects parameters on scaled (standardised) data, an argument could be made for a normal distribution with a standard deviation of 1 (e.g. normal(0,1)), although some prefer a t distribution with 3 degrees of freedom and standard deviation of 1 (e.g. student_t(3,0,1)) - apparently a flatter t is a more robust prior than a normal as an uninformative prior…
for un-scaled data, the above priors can be scaled by using the standard deviation of the data as the prior standard deviation (e.g. student_t(3,0,sd(y)), or sudent_t(3,0,sd(y)/sd(x)))
for priors of hierachical standard deviations, priors should encourage shrinkage towards 0 (particularly if the number of groups is small, since otherwise, the sampler will tend to be more responsive to “noise”).

In this tutorial series, we will perform Bayesian analysis in the STAN language via an R interface. Two popular interfaces that greatly simplify the specification of Bayesian models are brms and rstanarm. We will exclusively focus on the former as it is far more flexible.

Family	Parameter	brms	rstanarm
Gaussian	Intercept	`student_t(3,median(y),mad(y))`	`normal(mean(y),2.5*sd(y))`
	‘Population effects’ (slopes, betas)	flat, improper priors	`normal(0,2.5*sd(y)/sd(x))`
	Sigma	`student_t(3,0,mad(y))`	`exponential(1/sd(y))`
	‘Group-level effects’	`student_t(3,0,mad(y))`	`decov(1,1,1,1)`
	Correlation on group-level effects	`ljk_corr_cholesky(1)`
Poisson	Intercept	`student_t(3,median(y),mad(y))`	`normal(mean(y),2.5*sd(y))`
	‘Population effects’ (slopes, betas)	flat, improper priors	`normal(0,2.5*sd(y)/sd(x))`
	‘Group-level effects’	`student_t(3,0,mad(y))`	`decov(1,1,1,1)`
	Correlation on group-level effects	`ljk_corr_cholesky(1)`
Negative binomial	Intercept	`student_t(3,median(y),mad(y))`	`normal(mean(y),2.5*sd(y))`
	‘Population effects’ (slopes, betas)	flat, improper priors	`normal(0,2.5*sd(y)/sd(x))`
	Shape	`gamma(0.01, 0.01)`	`exponential(1/sd(y))`
	‘Group-level effects’	`student_t(3,0,mad(y))`	`decov(1,1,1,1)`
	Correlation on group-level effects	`ljk_corr_cholesky(1)`

Notes:

brms

https://github.com/paul-buerkner/brms/blob/c2b24475d727c8afd8bfc95947c18793b8ce2892/R/priors.R

In the above, for non-Gaussian families, y is first transformed according to the family link. If the family link is log, then 0.1 is first added to 0 values.
in brms the minimum standard deviation for the Intercept prior is 2.5
in brms the minimum standard deviation for group-level priors is 10.

rstanarm

http://mc-stan.org/rstanarm/articles/priors.html

in rstanarm priors on standard deviation and correlation associated with group-level effects are packaged up into a single prior (decov which is a decomposition of the variance and covariance matrix).

In my experience, I find that the above priors tend to be a little bit too wide for many ecological applications and I often prefer to use 1.5 rather than 2.5 as the multiplier.

Note

In Bayesian models, centering of predictors offers huge numerical advantages. So important is it to center that brms automatically centers any continuous predictors for you. However, since the user has not necessarily centered the predictors, the user might misinterpret the outputs from a brms model. Consequently, when fitting a model, brms also generates y-intercept values that are consistent with un-centered values and these are the estimates returned to the user.

Nevertheless, I would recommend that you should always explicitly center continuous predictors to provide more meaningful interpretations of the y-intercept. I would also highly recommend standardising continuous predictors - this will not only help speed up and stabilise the model, it will simplify the spefication of priors - see the specific examples later in this tutorial.

Based on the above, for our fabricated data, lets assign the following priors:

$\beta_0$: Normal prior centred at 31.21 with a variance of 15.17

mean:
```
dat$y |> median() |> round(2)
```
```
[1] 31.21
```
variance:
```
dat$y |> mad() |> round(2)
```
```
[1] 15.17
```

$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 4.09
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
mad(dat$y) / mad(dat$x) |> round(2)
```
```
[1] 4.090138
```
$\sigma$: (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 15.17
- variance:
```
dat$y |> mad() |> round(2)
```
```
[1] 15.17
```

Note, again, when fitting models through either rstanarm or brms, the priors assume that the predictor(s) have been centred and are to be applied on the link scale. In this case the link scale is an identity.

Similar logic can be applied for models that employ different distributions. In the following sections, we will define numerous sets of data (each of which represents different major forms of ecological data) and see how we can set appropriate priors in each case. In working through these example, it is worth reflecting on how much simpler prior specification is if we use standardised predictors.

5 Example data

This tutorial will blend theoretical discussions with actual calculations and model fits. I believe that by bridging the divide between theory and application, we all gain better understanding. The applied components of this tutorial will be motivated by numerous fabricated data sets. The advantage of simulated data over real data is that with simulated data, we know the ‘truth’ and can therefore gauge the accuracy of estimates.

The motivating examples are:

Example 1 - simulated samples drawn from a Gaussian (normal) distribution reminiscent of data collected on measurements (such as body mass)
Example 2 - simulated Gaussian samples drawn three different populations representing three different treatment levels (e.g. body masses of three different species)
Example 3 - simulated samples drawn from a Poisson distribution reminiscent of count data (such as number of individuals of a species within quadrats)
Example 4 - simulated samples drawn from a Negative Binomial distribution reminiscent of over-dispersed count data (such as number of individuals of a species that tends to aggregate in groups)
Example 5 - simulated samples drawn from a Bernoulli (binomial with $n = 1$) distribution reminiscent of binary data (such as the presence/absence of a species within sites)
Example 6 - simulated samples drawn from a Binomial distribution reminiscent of proportional data (such as counts of a particular taxa out of a total number of individuals)

Lets formally simulate the data illustrated above. The underlying process dictates that on average a one unit change in the predictor (x) will be associated with a five unit change in response (y) and when the predictor has a value of 0, the response will typically be 2. Hence, the response (y) will be related to the predictor (x) via the following:

\[ y = 2 + 5x \]

This is a deterministic model, it has no uncertainty. In order to simulate actual data, we need to add some random noise. We will assume that the residuals are drawn from a Gaussian distribution with a mean of zero and standard deviation of 4. The predictor will comprise of 10 uniformly distributed integer values between 1 and 10. We will round the response to two decimal places.

For repeatability, a seed will be employed on the random number generator. Note, the smaller the dataset, the less it is likely to represent the underlying deterministic equation, so we should keep this in mind when we look at how closely our estimated parameters approximate the ‘true’ values. Hence, the seed has been chosen to yield data that maintain a general trend that is consistent with the defining parameters.

set.seed(234)
dat <- data.frame(x = 1:10) |>
    mutate(y = round(2 + 5*x + rnorm(n = 10, mean = 0, sd = 4), digits = 2))
dat

    x     y
1   1  9.64
2   2  3.79
3   3 11.00
4   4 27.88
5   5 32.84
6   6 32.56
7   7 37.84
8   8 29.86
9   9 45.05
10 10 47.65

ggplot(data = dat) + 
geom_point(aes(y = y, x = x))

We will use these data in two ways. Firstly, to estimate the mean and variance of the reponse (y) ignoring the predcitor (x) and secondly to estimate the relationship between the reponse and predictor.

For the former, we know that the mean and variance of the response (y) can be calculated as:

\[ \begin{align} \bar{y} =& \frac{1}{n}\sum^n_{i=1}y_i\\ var(y) =& \frac{1}{n}\sum^n_{i=1}(y-\bar{y})^2\\ sd(y) =& \sqrt{var(y)} \end{align} \]

mean(dat$y)

[1] 27.811

var(dat$y)

[1] 225.9111

sd(dat$y)

[1] 15.03034

As previously described, categorical predictors are transformed into dummy codes prior to the fitting of the linear model. We will simulate a small data set with a single categorical predictor comprising a control and two treatment levels (‘mediam’, ‘high’). To simplify things we will assume a Gaussian distribution, however most of the modelling steps would be the same regardless of the chosen distribution.

The data will be drawn from three Gaussian distributions with a standard deviation of 4 and means of 20, 15 and 10. We will draw a total of 12 observations, four from each of the three populations.

set.seed(123)
beta_0 <- 20
beta <- c(-2, -10)
sigma <- 4
n <- 12
x <- gl(3, 4, 12, labels = c('control', 'medium', 'high'))
y <- (model.matrix(~x) %*% c(beta_0, beta)) + rnorm(12, 0, sigma)
dat2 <- data.frame(x = x, y = y)
dat2

         x         y
1  control 17.758097
2  control 19.079290
3  control 26.234833
4  control 20.282034
5   medium 18.517151
6   medium 24.860260
7   medium 19.843665
8   medium 12.939755
9     high  7.252589
10    high  8.217352
11    high 14.896327
12    high 11.439255

ggplot(data = dat2) + 
geom_point(aes(y = y, x = x))

The Poisson distribution is only parameterized by a single parameter ($\lambda$) which represents both the mean and variance. Furthermore, Poisson data can only be positive integers.

Unlike simple trend between two Gaussian or uniform distributions, modelling against a Poisson distribution alters the scale to logarithms. This needs to be taken into account when we simulate the data. The parameters that we used to simulate the underlying processes need to either be on a logarithmic scale, or else converted to a logarithmic scale prior to using them for generating the random data.

Moreover, for any model that involves a non-identity link function (such as a logarithmic link function for Poisson models), ‘slope’ is only constant on the scale of the link function. When it is back transformed onto the natural scale (scale of the data), it takes on a different meaning and interpretation.

We will chose $\beta_0$ to represent a value of 1 when x=0. As for the ‘effect’ of the predictor on the response, lets say that for every one unit increase in the predictor the response increases by 40% (on the natural scale). Hence, on the log scale, the slope will be $log(1.5)=$ 0.3364722.

set.seed(123)
beta <- c(1, 1.40)
beta <- log(beta)
n <- 10
dat3 <- data.frame(x=seq(from = 1, to = 10, len = n)) |>
    mutate(y = rpois(n, lambda = exp(beta[1] + beta[2]*x)))
dat3

ggplot(data = dat3) + 
geom_point(aes(y = y, x = x))

In theory, count data should follow a Poisson distribution and therefore have properties like mean equal to variance (e.g. $\textnormal{Dispersion}=\frac{\sigma}{\mu}=1$). However as simple linear models are low dimensional representations of a system, it is often unlikely that such a simple model can capture all the variability in the response (counts). For example, if we were modelling the abundance of a species of intertidal snail within quadrats in relation to water depth, it is highly likely that water depth alone drives snail abundance. There are countless other influences that the model has not accounted for. As a result, the observed data might be more variable than a Poisson (of a particular mean) would expect and in such cases, the model is over-dispersed (more variance than expected).

Over dispersed models under-estimate the variability and thus precision in estimates resulting in inflated confidence in outcomes (elevated Type I errors).

There are numerous causes of over-dispersed count data (one of which is eluded to above). These are:

additional sources of variability not being accounted for in the model (see above)
when the items being counted aggregate together. Although the underlying items may have been generated by a Poisson process, the items clump together. When the items are counted, they are more likely to be in either in relatively low or relatively high numbers - hence the data are more varied than would be expected from their overall mean.
imperfect detection resulting in excessive zeros. Again the underlying items may have been generated by a Poisson process, however detecting and counting the items might not be completely straight forward (particularly for more cryptic items). Hence, the researcher may have recorded no individuals in a quadrat and yet there was one or more present, they were just not obvious and were not detected. That is, layered over the Poisson process is another process that determines the detectability. So while the Poisson might expect a certain proportion of zeros, the observed data might have a substantially higher proportion of zeros - and thus higher variance.

This example will generate data that is drawn from a negative binomial distribution so as to broadly represent any one of the above causes.

set.seed(234)
beta <- c(1, 1.40)
beta <- log(beta)
n <- 10
size <- 10
dat4 <- data.frame(x = seq(from = 1, to = 10, len = n)) |>
    mutate(
        mu = exp(beta[1] + beta[2] * x),
        y = rnbinom(n, size = size, mu =  mu)
    )
dat4

    x        mu  y
1   1  1.400000  0
2   2  1.960000  3
3   3  2.744000  7
4   4  3.841600  3
5   5  5.378240  5
6   6  7.529536  9
7   7 10.541350 13
8   8 14.757891 10
9   9 20.661047 17
10 10 28.925465 26

ggplot(data = dat4) + 
geom_point(aes(y = y, x = x))

Binary data (presence/absence, dead/alive, yes/no, heads/tails, etc) pose unique challenges for linear modeling. Linear regression, designed for continuous outcomes, may not be directly applicable to binary responses. The nature of binary data violates assumptions of normality and homoscedasticity, which are fundamental to linear regression. Furthermore, linear models may predict probabilities outside the [0, 1] range, leading to unrealistic predictions.

This example will generate data that is drawn from a bernoulli distribution so as to broadly represent presence/absence data.

We will chose $\beta_0$ to represent the odds of a value of 1 when $x=0$ equal to $0.02$. This is equivalent to a probability of $y$ being zero when $x=0$ of $\frac{0.02}{1+0.02}=0.0196$. E.g., at low $x$, the response is likely to be close to 0. For every one unit increase in $x$, we will stipulate a 2 times increase in odds that the expected response is equal to 1.

set.seed(234)
beta <- c(0.02, 2)
beta <- log(beta)
n <- 10
dat5 <- data.frame(x = seq(from = 1, to = 10, len = n)) |>
    mutate(
        y = as.numeric(rbernoulli(n, p = plogis(beta[1] + beta[2] * x)))
    )
dat5

ggplot(data = dat5) + 
geom_point(aes(y = y, x = x))

Similar to binary data, proportional (binomial) data tend to violate normality and homogeneity of variance (particularly as mean proportions approach either 0% or 100%.

This example will generate data that is drawn from a binomial distribution so as to broadly represent proportion data.

We will chose $\beta_0$ to represent the odds of a particular trial (e.g. an individual) being of a particular type (e.g. species 1) when $x=0$ and to equal to $0.02$. This is equivalent to a probability of $y$ being of the focal type when $x=0$ of $\frac{0.02}{1+0.02}=0.0196$. E.g., at low $x$, the the probability that an individual is taxa 1 is likely to be close to 0. For every one unit increase in $x$, we will stipulate a 2.5 times increase in odds that the expected response is equal to 1.

For this example, we will also convert the counts into proportions ($y2$) by division with the number of trials ($5$).

set.seed(123)
beta <- c(0.02, 2.5)
beta <- log(beta)
n <- 10
trials <- 5
dat6 <- data.frame(x = seq(from = 1, to = 10, len = n)) |>
    mutate(
      count = as.numeric(rbinom(n, size = trials, prob = plogis(beta[1] + beta[2] * x))),
      total = trials,
      y = count/total
    )
dat6

    x count total   y
1   1     0     5 0.0
2   2     1     5 0.2
3   3     1     5 0.2
4   4     4     5 0.8
5   5     2     5 0.4
6   6     5     5 1.0
7   7     5     5 1.0
8   8     4     5 0.8
9   9     5     5 1.0
10 10     5     5 1.0

ggplot(data = dat6) + 
geom_point(aes(y = y, x = x))

6 Exploratory data analysis

Statistical models utilize data and the inherent statistical properties of distributions to discern patterns, relationships, and trends, enabling the extraction of meaningful insights, predictions, or inferences about the phenomena under investigation. To do so, statistical models make assumptions about the likely distributions from which the data were collected. Consequently, the reliability and validity of any statistical model depend upon adherence to these underlying assumptions.

Exploratory Data Analysis (EDA) and assumption checking therefore play pivotal roles in the process of statistical analysis, offering essential tools to glean insights, assess the reliability of statistical methods, and ensure the validity of conclusions drawn from data. EDA involves visually and statistically examining datasets to understand their underlying patterns, distributions, and potential outliers. These initial steps provides an intuitive understanding of the data’s structure and guides subsequent analyses. By scrutinizing assumptions, such as normality, homoscedasticity, and independence, researchers can identify potential limitations or violations that may impact the accuracy and reliability of their findings.

Exploratory Data Analysis within the context of ecological statistical models usually comprise a set of targetted graphical summaries. They are not to be considered definitive diagnostics of the model assumptions, but rather a first pass to assess the obvious violations prior to the fitting of models. More definitive diagnostics can only be achieved after a model has been fit.

In addition to graphical summaries, there are numerous statistical tests to help explore possible violations of various statistical assumptions. These tests are less commonly used in ecology since they are often more sensitive to deviations from ideal than are the models that we are seeking to ensure.

Simple classic regression models are often the easiest models to fit and interpret and as such often represent a standard by which other alternate models are gauged. As you will see later in this tutorial, such models can actually be fit using closed form (exact solution) matrix algebra that can be performed by hand. Nevertheless, and perhaps as a result, they also impose some of the strictest assumptions. Although these collective assumptions are specific to gaussian models, they do provide a good introduction to model assumptions in general, so we will use them to motivate the more wider discussion.

Simple (gaussian) linear models (represented below) make the following assumptions:

Notes on the data depicted above

The data depicted above where generated using the following R codes:

set.seed(234)
x <- 1:10
y <- 2 + (5*x) + rnorm(10,0,4)

The observations represent

single observations drawn from 10 normal populations
each population had a standard deviation of 4
the mean of each population varied linearly according to the value of x ($2 + 5x$)

normality: the residuals (and thus observations) must be drawn from populations that are normal distribution. The right hand figure underlays the ficticious normally distributed populations from which the observed values have been sampled.

More information about assessing normality

Estimation and inference testing in linear regression assumes that the response is normally distributed in each of the populations. In this case, the populations are all possible measurements that could be collected at each level of $x$ - hence there are 16 populations. Typically however, we only collect a single observation from each population (as is also the case here). How then can be evaluate whether each of these populations are likely to have been normal?

For a given response, the population distributions should follow much the same distribution shapes. Therefore provided the single samples from each population are unbiased representations of those populations, a boxplot of all observations should reflect the population distributions.

The two figures above show the relationships between the individual population distributions and the overall distribution. The left hand figure shows a distribution drawn from single representatives of each of the 16 populations. Since the 16 individual populations were normally distributed, the distribution of the 16 observations is also normal.

By contrast, the right hand figure shows 16 log-normally distributed populations and the resulting distribution of 16 single observations drawn from these populations. The overall boxplot mirrors each of the individual population distributions.

Whilst traditionally, non-normal data would typically be normalised via a scale transformation (such as a logarithmic transformation), these days it is arguably more appropriate to attempt to match the data to a more suitable distribution (see later in this tutorial).

You may have noticed that we have only explored the distribution of the response (y-axis). What about the distribution of the predictor (independent, x-axis) variable, does it matter? The distribution assumption applies to the residuals (which as purely in the direction of the y-axis). Indeed technically, it is assumed that there is no uncertainty associated with the predictor variable. They are assumed to be set and thus there is no error associated with the values observed. Whilst this might not always be reasonable, it is an assumption.

Given that the predictor values are expected to be set rather than measured, we actually assume that they are uniformly distributed. In practice, the exact distribution of predictor values is not that important provided it is reasonably symmetrical and no outliers (unusually small or large values) are created as a result of the distribution.

As with exploring the distribution of the response variable, boxplots, histograms and density plots can be useful means of exploring the distribution of predictor variable(s). When such diagnostics reveal distributional issues, scale transformations (such as logarithmic transformations) are appropriate.

homogeneity of variance: the residuals (and thus observations) must be drawn from populations that are equally varied. The model as shown only estimates a single variance ($\sigma^2$) parameter - it is assumed that this is a good overall representation of all underlying populations. The right hand figure underlays the ficticious normally distributed and equally varied populations from which the observations have been sampled.

Moreover, since the expected values (obtained by solving the deterministic component of the model) and the variance must be estimated from the same data, they need to be independent (not related one another)

More information about assessing homogeneity of variance

Simple linear regression also assumes that each of the populations are equally varied. Actually, it is the prospect of a relationship between the mean and variance of y-values across x-values that is of the greatest concern. Strictly the assumption is that the distribution of y values at each x value are equally varied and that there is no relationship between mean and variance.

However, as we only have a single y-value for each x-value, it is difficult to directly determine whether the assumption of homogeneity of variance is likely to have been violated (mean of one value is meaningless and variability can’t be assessed from a single value). The figure below depicts the ideal (and almost never realistic) situation in which (left hand figure) the populations are all equally varied. The middle figure simulates drawing a single observation from each of the populations. When the populations are equally varied, the spread of observed values around the trend line is fairly even - that is, there is no trend in the spread of values along the line.

If we then plot the residuals (difference between observed values and those predicted by the trendline) against the predict values, there is a definite lack of pattern. This lack of pattern is indicative of a lack of issues with homogeneity of variance.

If we now contrast the above to a situation where the population variance is related to the mean (unequal variance), we see that the observations drawn from these populations are not evenly distributed along the trendline (they get more spread out as the mean predicted value increase). This pattern is emphasized in the residual plot which displays a characteristic “wedge”-shape pattern.

Hence looking at the spread of values around a trendline on a scatterplot of $y$ against $x$ is a useful way of identifying gross violations of homogeneity of variance. Residual plots provide an even better diagnostic. The presence of a wedge shape is indicative that the population mean and variance are related.

linearity: the underlying relationships must be simple linear trends, since the line of best fit through the data (of which the slope is estimated) is linear. The right hand figure depicts a linear trend through the underlying populations.

More information about assessing linearity

It is important to disclose the meaning of the word “linear” in the term “linear regression”. Technically, it refers to a linear combination of regression coefficients. For example, the following are examples of linear models:

$y_i = \beta_0 + \beta_1 x_i$
$y_i = \beta_0 + \beta_1 x_i + \beta_2 z_i$
$y_i = \beta_0 + \beta_1 x_i + \beta_2 x^2_i$

All the coefficients ($\beta_0$, $\beta_1$, $\beta_2$) are linear terms. Note that the last of the above examples, is a linear model, however it describes a non-linear trend. Contrast the above models with the following non-linear model:

$y_i = \beta_0 + x_i^{\beta_1}$

In that case, the coefficients are not linear combinations (one of them is a power term).

That said, a simple linear regression usually fits a straight (linear) line through the data. Therefore, prior to fitting such a model, it is necessary to establish whether this really is the most sensible way of describing the relationship. That is, does the relationship appear to be linearly related or could some other non-linear function describe the relationship better. Scatterplots and residual plots are useful diagnostics.

To see how a residual plot could be useful, consider the following. The first row of figures illustrate the residuals resulting from data drawn from a linear trend. The residuals are effectively random noise. By contrast, the second row show the residuals resulting from data drawn from a non-normal relationship that have nevertheless been modelled as a linear trend. There is still a clear pattern remaining in the residuals.

The above might be an obvious and somewhat overly contrived example, yet it does illustrate the point - that a pattern in the residuals could point to a mis-specified model.

If non-linearity does exist (as in the second case above) , then fitting a straight line through what is obviously not a straight relationship is likely to poorly represent the true nature of the relationship. There are numerous causes of non-linearity:

underlying distributional issues can result in non-linearity. For example, if we are assuming a gaussian distribution and the data are non-normal, often the relationships will appear non-linear. Addressing the distributional issues can therefore resolve the linearity issues
the underlying relationship might truly be non-linear in which case this should be reflected in some way by the model formula. If the model formula fails to describe the non-linear trend, then problems will persist.
the model proposed is missing an important covariate that might help standardise the data in a way that results in linearity

independence: the residuals (and thus observations) must be independently drawn from the populations. That is, the correlation between all the observations is assumed to be 0 (off-diagonals in the covariance matrix). More practically, there should be no pattern to the correlations between observations.

Random sampling and random treatment assignment are experimental design elements that are intended to mitigate many types of sampling biases that cause dependencies between observations. Nevertheless, there are aspects of sampling designs that are either logistically difficult to randomise or in some cases not logically possible. For example, the residuals from observations sampled closer together in space and time will likely be more similar to one another than those of observations more spaced apart. Since neither space nor time can be randomised, data collected from sampling designs that involve sampling over space and/or time need to be assess for spatial and temporal dependencies. These concepts will be explored in the context of introducing susceptible designs in a later tutorial.

The above is only a very brief overview of the model assumptions that apply to just one specific model (simple linear gaussian regression). For the remainder of this section, we will graphically explore the two motivating example data sets so as gain insights into what distributional assumptions might be most valid, and thus help guide modelling choices. Similarly, for subsequent tutorials in this series (that introduce progressively more complex models), all associated assumptions will be explored and detailed.

dat |> 
  ggplot(aes(y = y)) +
  geom_boxplot()

Conclusions

there is no strong evidence of non-normality
to be convincing evidence of non-normality, each segment of the boxplot should get progressively larger

dat |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_point()

`geom_smooth()` using formula = 'y ~ x'

Conclusions

the spread of values around the trendline seems fairly even (hence it there is no evidence of non-homogeneity

dat |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_point()

`geom_smooth()` using formula = 'y ~ x'

Conclusions

the data seems well represented by the linear trendline. Furthermore, the lowess smoother does not appear to have a consistent shift trajectory.

Conclusions

there are no obvious violations of the linear regression model assumptions
we can now fit the suggested model
full confirmation about the model’s goodness of fit should be reserved until after exploring the additional diagnostics that are only available after fitting the model.

dat2 |> 
  ggplot(aes(y = y, x = x)) +
  geom_boxplot()

Conclusions

there is no consitent evidence of non-normality across all groups
even though the control group demonstrates some evidence of non-normality

dat2 |> 
  ggplot(aes(y = y, x = x)) +
  geom_boxplot()

Conclusions

the spread of noise in each group seems reasonably similar
more importantly, there does not seem to be a relationship between the mean (as approximated by the position of the boxplots along the y-axis) and the variance (as approximated by the spread of the boxplots).
that is, the size of the boxplots do not vary with the elevation of the boxplots.

Linearity is not an issue for categorical predictors since it is effectively fitting separate lines between pairs of points (and a line between two points can only ever be linear)….

Conclusions

no evidence of non-normality
no evidence of non-homogeneity of variance

dat3 |> 
  ggplot(aes(y = y)) +
  geom_boxplot()

Conclusions

there is strong evidence of non-normality
each segment of the boxplot should get progressively larger

dat3 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_point()

`geom_smooth()` using formula = 'y ~ x'

Conclusions

the spread of noise does not look random along the line of best fit.
homogeneity of variance is difficult to assess in the presence of distributional issues (such as non-normality in this case) as they can result in non-linearity (apparent here)

dat3 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_smooth(colour = "red") +
  geom_point()

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Conclusions

the data do not appear to be linear
the red line is a loess smoother and it is clear that the data are not linear

Conclusions

there are obvious violations of the linear regression model assumptions
we should consider a different model that does not assume normality

dat4 |> 
  ggplot(aes(y = y)) +
  geom_boxplot()

Conclusions

there is strong evidence of non-normality
each segment of the boxplot should get progressively larger

dat4 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_point()

`geom_smooth()` using formula = 'y ~ x'

Conclusions

the spread of noise does not look random along the line of best fit.
homogeneity of variance is difficult to assess in the presence of distributional issues (such as non-normality in this case) as they can result in non-linearity (apparent here)

dat4 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_smooth(colour = "red") +
  geom_point()

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Conclusions

the data do not appear to be linear
the red line is a loess smoother and it is clear that the data are not linear

Conclusions

there are obvious violations of the linear regression model assumptions
we should consider a different model that does not assume normality

dat5 |> 
  ggplot(aes(y = y)) +
  geom_boxplot()

Conclusions

clearly a set of 0’s and 1’s cant be normally distributied.

dat5 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_point()

`geom_smooth()` using formula = 'y ~ x'

Conclusions

the spread of noise does not look random (or equal) along the line of best fit.

dat5 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_smooth(colour = "red") +
  geom_point()

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Conclusions

the data are clearly not linear
the red line is a loess smoother and it is clear that the data are not linear

Conclusions

there are obvious violations of the linear regression model assumptions
we should consider a different model that does not assume normality

dat6 |> 
  ggplot(aes(y = y)) +
  geom_boxplot()

Conclusions

distribution is not normal and is truncated

dat6 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_point()

`geom_smooth()` using formula = 'y ~ x'

Conclusions

the spread of noise does not look random (or equal) along the line of best fit.

dat6 |> 
  ggplot(aes(y = y, x = x)) +
  geom_smooth(method = "lm") +
  geom_smooth(colour = "red") +
  geom_point()

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Conclusions

although there is no evidence of non-linearity from this small data set, it is worth noting that the line of best fit does extend outside the logical response range [0.1] within the range of observed $x$ values. That is, a simple linear model would predict proportions higher than 100% at high values of $x$
this is a common issue with binomial data and is often addressed by fitting a logistic regression model

Conclusions

there are obvious violations of the linear regression model assumptions
we should consider a different model that does not assume normality

7 Fitting models

One way to assess the priors is to have the MCMC sampler sample purely from the prior predictive distribution without conditioning on the observed data. Doing so provides a glimpse at the range of predictions possible under the priors. On the one hand, wide ranging predictions would ensure that the priors are unlikely to influence the actual predictions once they are conditioned on the data. On the other hand, if they are too wide, the sampler is being permitted to traverse into regions of parameter space that are not logically possible in the context of the actual underlying ecological context. Not only could this mean that illogical parameter estimates are possible, when the sampler is traversing regions of parameter space that are not supported by the actual data, the sampler can become unstable and have difficulty.

In brms, we can inform the sampler to draw from the prior predictive distribution instead of conditioning on the response, by running the model with the sample_prior = 'only' argument. Unfortunately, this cannot be applied when there are flat priors (since the posteriors will necessarily extend to negative and positive infinity). Therefore, in order to use this useful routine, we need to make sure that we have defined a proper prior for all parameters.

Earlier we suggested the following priors might be useful:

$\beta_0$: Normal prior centred at 31.21 with a variance of 15.17

mean:
```
dat$y |> median() |> round(2)
```
```
[1] 31.21
```
variance:
```
dat$y |> mad() |> round(2)
```
```
[1] 15.17
```

$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 4.09
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
mad(dat$y) / mad(dat$x) |> round(2)
```
```
[1] 4.090138
```
$\sigma$: (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 15.17
- variance:
```
dat$y |> mad() |> round(2)
```
```
[1] 15.17
```

It might be use usefull to understand what some of these distributions look like. For example, we have used both a normal (Gaussian) distribution and a flatter t distribution for y-intercept and slope respectively. This was a somewhat arbitrary choice. We could easily have gone with either normal or t distributions for all of the above parameters. To visualise prior distributions for the slope based on both normal and t distributions:

standist::visualize("normal(0, 4.09)", "student_t(3, 0, 4.09)", xlim = c(-20, 20))

Evidently, the t distribution (with 3 degrees of freedom) is wider than the normal distribution. The former should be more robust to data with values that are less concentrated around the mean.

priors <- prior(normal(31.21, 15.17), class = "Intercept") +
    prior(student_t(3, 0, 4.09), class = "b") +
    prior(student_t(3, 0, 15.17), class = "sigma")

\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 x_i\\ \end{align} \]

start by fitting the model and sampling from the priors only

dat <- data.frame(y = rnorm(10), x = rnorm(10))
brm(y ~ x, data = dat, backend = "rstan")

Compiling Stan program...

Start sampling


SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 6e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.012 seconds (Warm-up)
Chain 1:                0.012 seconds (Sampling)
Chain 1:                0.024 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 3e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.012 seconds (Warm-up)
Chain 2:                0.012 seconds (Sampling)
Chain 2:                0.024 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 3e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.012 seconds (Warm-up)
Chain 3:                0.012 seconds (Sampling)
Chain 3:                0.024 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 3e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.012 seconds (Warm-up)
Chain 4:                0.011 seconds (Sampling)
Chain 4:                0.023 seconds (Total)
Chain 4:

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: dat (Number of observations: 10) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.27      0.37    -0.45     1.01 1.00     2264     1483
x            -0.10      0.44    -0.98     0.82 1.00     2351     1997

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.09      0.33     0.66     1.92 1.00     2182     2260

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

dat1a.brm <- brm(bf(y ~ x),
                data=dat,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

dat1a.brm |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat1a.brm2 <- update(dat1a.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat1a.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).

When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.

dat1a.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_x"             "sigma"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_sigma"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"

dat1a.brm2 |> hypothesis("x = 0") |> plot()

dat1a.brm2 |> hypothesis("sigma = 0", class = "") |> plot()

Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.

dat1a.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_x"             "sigma"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_sigma"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"

dat1a.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

Unless you explicitly direct brm to include a user-defined intercept, the priors on the default intercept should assume that the predictor(s) are centered (because brm will automatically center all continuous predictors).

Lets try the following priors:

$\beta_0$: Normal prior centred at 0.61 with a variance of 0.48

mean:
```
dat$y |> median() |> round(2)
```
```
[1] 0.61
```
variance:
```
dat$y |> mad() |> round(2)
```
```
[1] 0.48
```

$\beta_1$: t distribution (3 degrees of freedom) prior centered at 0 with a variance of 0.75
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
mad(dat$y) / mad(dat$x) |> round(2)
```
```
[1] 0.7518247
```
$\sigma$: (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 0.48
- variance:
```
dat$y |> mad() |> round(2)
```
```
[1] 0.48
```

priors <- prior(normal(31.21, 15.17), class = "Intercept") +
    prior(student_t(3, 0, 4.09), class = "b") +
    prior(student_t(3, 0, 15.17), class = "sigma")

\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 (x_i - \bar{x})\\ \end{align} \]

start by fitting the model and sampling from the priors only

dat1b.brm <- brm(bf(y ~ scale(x, scale = FALSE)),
                data=dat,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

dat1b.brm |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat1b.brm2 <- update(dat1b.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat1b.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat1b.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "sigma"               
 [4] "Intercept"            "prior_Intercept"      "prior_b"             
 [7] "prior_sigma"          "lprior"               "lp__"                
[10] "accept_stat__"        "stepsize__"           "treedepth__"         
[13] "n_leapfrog__"         "divergent__"          "energy__"

dat1b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot()

dat1b.brm2 |> hypothesis("sigma = 0", class = "") |> plot()

dat1b.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "sigma"               
 [4] "Intercept"            "prior_Intercept"      "prior_b"             
 [7] "prior_sigma"          "lprior"               "lp__"                
[10] "accept_stat__"        "stepsize__"           "treedepth__"         
[13] "n_leapfrog__"         "divergent__"          "energy__"

dat1b.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

When the predictor is standardised, it simplifies prior definition because we no longer need to consider the scale of the predictor.

Lets try the following priors:

$\beta_0$: Normal prior centred at 0.61 with a variance of 0.48

mean:
```
dat$y |> median() |> round(2)
```
```
[1] 0.61
```
variance:
```
dat$y |> mad() |> round(2)
```
```
[1] 0.48
```

$\beta_1$: t distribution (3 degrees of freedom) prior centered at 0 with a variance of 0.48
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
mad(dat$y) |> round(2)
```
```
[1] 0.48
```
$\sigma$: (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 0.48
- variance:
```
dat$y |> mad() |> round(2)
```
```
[1] 0.48
```

priors <- prior(normal(31.21, 15.17), class = "Intercept") +
    prior(student_t(3, 0, 15.17), class = "b") +
    prior(student_t(3, 0, 15.17), class = "sigma")

\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 (x_i - \bar{x})/\sigma_{x}\\ \end{align} \]

start by fitting the model and sampling from the priors only

dat1c.brm <- brm(bf(y ~ scale(x)),
                data=dat,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

dat1c.brm |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat1c.brm2 <- update(dat1c.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat1c.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat1c.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_scalex"        "sigma"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_sigma"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"

dat1c.brm2 |> hypothesis("scalex = 0") |> plot()

dat1c.brm2 |> hypothesis("sigma = 0", class = "") |> plot()

dat1c.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_scalex"        "sigma"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_sigma"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"

dat1c.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

Lets try the following priors:

$\beta_0$: Normal prior centred at 19.68 with a variance of 1.87

mean:

dat2 |>
  group_by(x) |>
    summarise(across(y, list(med = median, sd = sd, mad = mad)))

# A tibble: 3 × 4
  x       y_med  y_sd y_mad
  <fct>   <dbl> <dbl> <dbl>
1 control 19.7   3.74  1.87
2 medium  19.2   4.90  4.70
3 high     9.83  3.46  3.10

variance:

dat2 |>
  group_by(x) |>
    summarise(across(y, list(med = median, sd = sd, mad = mad)))

# A tibble: 3 × 4
  x       y_med  y_sd y_mad
  <fct>   <dbl> <dbl> <dbl>
1 control 19.7   3.74  1.87
2 medium  19.2   4.90  4.70
3 high     9.83  3.46  3.10

$\beta_1$: t distribution (3 degrees of freedom) prior centered at 0 with a variance of 9.35
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
dat2 |>
  group_by(x) |>
  summarise(across(y, median)) |>
  pull(y) |>
  diff() |>
  abs() |>
  max()
```
```
[1] 9.352104
```
$\sigma$: (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 4.7
- variance:
```
dat2 |>
group_by(x) |>
summarise(across(y, mad)) |>
pull(y) |>
max()
```
```
[1] 4.702147
```

priors <- prior(normal(19.68, 1.87), class = "Intercept") +
    prior(student_t(3, 0, 6.35), class = "b") +
    prior(student_t(3, 0, 4.7), class = "sigma")

\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \sum{\beta_j x_{ij}}\\ \end{align} \]

start by fitting the model and sampling from the priors only

dat2a.brm <- brm(bf(y ~ x),
                data=dat2,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

dat2a.brm |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat2a.brm2 <- update(dat2a.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat2a.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat2a.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_xmedium"       "b_xhigh"         "sigma"          
 [5] "Intercept"       "prior_Intercept" "prior_b"         "prior_sigma"    
 [9] "lprior"          "lp__"            "accept_stat__"   "stepsize__"     
[13] "treedepth__"     "n_leapfrog__"    "divergent__"     "energy__"

dat2a.brm2 |> hypothesis("xmedium = 0") |> plot()

dat2a.brm2 |> hypothesis("sigma = 0", class = "") |> plot()

dat2a.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_xmedium"       "b_xhigh"         "sigma"          
 [5] "Intercept"       "prior_Intercept" "prior_b"         "prior_sigma"    
 [9] "lprior"          "lp__"            "accept_stat__"   "stepsize__"     
[13] "treedepth__"     "n_leapfrog__"    "divergent__"     "energy__"

dat2a.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

Lets try the following priors:

$\beta$: t distribution (3 degrees of freedom) prior centered at 18.14 with a variance of 6.26
- mean: since each groups mean is being estimated separately, they could either all have different priors, or more commonly, the same priors.
- variance:
```
dat2 |>
    summarise(across(y, list(med = median, sd = sd, mad = mad)))
```
```
     y_med     y_sd    y_mad
1 18.13762 6.003828 6.255954
```
$\sigma$: (half) t distribution (3 degrees of freedom) centred at 0 with a variance of 4.7
- variance:
```
dat2 |>
group_by(x) |>
summarise(across(y, mad)) |>
pull(y) |>
max()
```
```
[1] 4.702147
```

priors <- prior(student_t(3, 16.18, 6.56), class = "b") +
    prior(student_t(3, 0, 4.7), class = "sigma")

\[ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \sum{\beta_j x_{ij}}\\ \end{align} \]

start by fitting the model and sampling from the priors only

dat2b.brm <- brm(bf(y ~ -1 + x),
                data=dat2,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

dat2b.brm |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat2b.brm2 <- update(dat2b.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat2b.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat2b.brm2 |> tidybayes::get_variables()

 [1] "b_xcontrol"    "b_xmedium"     "b_xhigh"       "sigma"        
 [5] "prior_b"       "prior_sigma"   "lprior"        "lp__"         
 [9] "accept_stat__" "stepsize__"    "treedepth__"   "n_leapfrog__" 
[13] "divergent__"   "energy__"

dat2b.brm2 |> hypothesis("xcontrol = 0") |> plot()

dat2b.brm2 |> hypothesis("xmedium = 0") |> plot()

dat2b.brm2 |> hypothesis("sigma = 0", class = "") |> plot()

dat2b.brm2 |> tidybayes::get_variables()

 [1] "b_xcontrol"    "b_xmedium"     "b_xhigh"       "sigma"        
 [5] "prior_b"       "prior_sigma"   "lprior"        "lp__"         
 [9] "accept_stat__" "stepsize__"    "treedepth__"   "n_leapfrog__" 
[13] "divergent__"   "energy__"

#dat2b.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Pois(\lambda_i)\\ log(\lambda_i) =& \beta_0 + \beta_1 x_i\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:

$\beta_0$: Normal prior centred at 1.99 with a variance of 1.33

mean:

dat3$y |>
    log() |> 
    median() |>
    round(2)

[1] 1.99

variance:

dat3$y |>
    log() |> 
    mad() |>
    round(2)

[1] 1.33

$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 2.05
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
dat3 |>
    mutate(across(c(y, x), log)) |>
    summarise(across(c(y, x), mad)) |>
    mutate(round(y / x, 2))
```
```
         y        x round(y/x, 2)
1 1.328231 0.648985          2.05
```

priors <- prior(normal(2.00, 1.33), class = "Intercept") +
    prior(student_t(3, 0, 2.00), class = "b")

start by fitting the model and sampling from the priors only

dat3a.form <- bf(y ~ x, family = poisson(link = "log"))
dat3a.brm <- brm(dat3a.form,
                data=dat3,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

dat3a.brm |>
  conditional_effects() |>
  plot(points = TRUE) |> _[[1]] +
  scale_y_log10()

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat3a.brm2 <- update(dat3a.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat3a.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat3a.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_x"             "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat3a.brm2 |> hypothesis("x = 0") |> plot()

dat3a.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_x"             "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat3a.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Pois(\mu_i)\\ log(\mu_i) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:

$\beta_0$: Normal prior centred at 1.99 with a variance of 1.33

mean:

dat3$y |>
    log() |> 
    median() |>
    round(2)

[1] 1.99

variance:

dat3$y |>
    log() |> 
    mad() |>
    round(2)

[1] 1.33

$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 2.05
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
dat3 |>
    mutate(across(c(y, x), log)) |>
    summarise(across(c(y, x), mad)) |>
    mutate(round(y / x, 2))
```
```
         y        x round(y/x, 2)
1 1.328231 0.648985          2.05
```

priors <- prior(normal(2.00, 1.33), class = "Intercept") +
    prior(student_t(3, 0, 2.00), class = "b")

start by fitting the model and sampling from the priors only

dat3b.form <- bf(y ~ scale(x, scale = FALSE), family = poisson(link = "log"))
dat3b.brm <- brm(dat3b.form,
                data=dat3,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

dat3b.brm |>
  conditional_effects() |>
  plot(points = TRUE) |> _[[1]] +
  scale_y_log10()

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat3b.brm2 <- update(dat3b.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat3b.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat3b.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "Intercept"           
 [4] "prior_Intercept"      "prior_b"              "lprior"              
 [7] "lp__"                 "accept_stat__"        "stepsize__"          
[10] "treedepth__"          "n_leapfrog__"         "divergent__"         
[13] "energy__"

dat3b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot()

dat3b.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "Intercept"           
 [4] "prior_Intercept"      "prior_b"              "lprior"              
 [7] "lp__"                 "accept_stat__"        "stepsize__"          
[10] "treedepth__"          "n_leapfrog__"         "divergent__"         
[13] "energy__"

dat3b.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Pois(\lambda_i)\\ log(\lambda_i) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:

$\beta_0$: Normal prior centred at 1.99 with a variance of 1.33

mean:

dat3$y |>
    log() |> 
    median() |>
    round(2)

[1] 1.99

variance:

dat3$y |>
    log() |> 
    mad() |>
    round(2)

[1] 1.33

$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1.33
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
dat3$y |> log() |> mad() |> round(2)
```
```
[1] 1.33
```

priors <- prior(normal(2.00, 1.33), class = "Intercept") +
    prior(student_t(3, 0, 1.33), class = "b")

start by fitting the model and sampling from the priors only

dat3c.form <- bf(y ~ scale(x), family = poisson(link = "log"))
dat3c.brm <- brm(dat3c.form,
                data=dat3,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

dat3c.brm |>
  conditional_effects() |>
  plot(points = TRUE) |> _[[1]] +
  scale_y_log10()

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat3c.brm2 <- update(dat3c.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat3c.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat3c.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat3c.brm2 |> hypothesis("scalex = 0") |> plot()

dat3c.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat3c.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& NB(\lambda_i, \phi)\\ log(\lambda_i) =& \beta_0 + \beta_1 x_i\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Negative Binomial models, the link scale is log. So the following priors might be appropriate:

$\beta_0$: Normal prior centred at 2.07 with a variance of 0.93

mean:

dat4$y |>
    log() |> 
    median() |>
    round(2)

[1] 2.07

variance:

dat4$y |>
    log() |> 
    mad() |>
    round(2)

[1] 0.93

$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1.43
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
dat4 |>
    mutate(across(c(y, x), log)) |>
    summarise(across(c(y, x), mad)) |>
    mutate(round(y / x, 2))
```
```
          y        x round(y/x, 2)
1 0.9303522 0.648985          1.43
```

priors <- prior(normal(2.00, 1.00), class = "Intercept") +
    prior(student_t(3, 0, 1.5), class = "b") +
    prior(gamma(0.01, 0.01), class = "shape")

start by fitting the model and sampling from the priors only

dat4a.form <- bf(y ~ x, family = negbinomial(link = "log"))
dat4a.brm <- brm(dat4a.form,
                data=dat4,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

Warning: There were 1627 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.

Warning: Examine the pairs() plot to diagnose sampling problems

explore the range of posterior predictions resulting from the priors alone

dat4a.brm |>
  conditional_effects() |>
  plot(points = TRUE) |> _[[1]] +
  scale_y_log10()

Warning in scale_y_log10(): log-10 transformation introduced infinite values.

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat4a.brm2 <- update(dat4a.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat4a.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat4a.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_x"             "shape"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_shape"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"

dat4a.brm2 |> hypothesis("x = 0") |> plot()

dat4a.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_x"             "shape"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_shape"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"

dat4a.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& NB(\mu_i, \phi)\\ log(\mu_i) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:

$\beta_0$: Normal prior centred at 2.07 with a variance of 0.93

mean:

dat4$y |>
    log() |> 
    median() |>
    round(2)

[1] 2.07

variance:

dat4$y |>
    log() |> 
    mad() |>
    round(2)

[1] 0.93

$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1.43
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
dat4 |>
    mutate(across(c(y, x), log)) |>
    summarise(across(c(y, x), mad)) |>
    mutate(round(y / x, 2))
```
```
          y        x round(y/x, 2)
1 0.9303522 0.648985          1.43
```

priors <- prior(normal(2.07, 0.93), class = "Intercept") +
    prior(student_t(3, 0, 1.43), class = "b") +
    prior(gamma(0.01, 0.01), class = "shape")

start by fitting the model and sampling from the priors only

dat4b.form <- bf(y ~ scale(x, scale = FALSE), family = negbinomial(link = "log"))
dat4b.brm <- brm(dat4b.form,
                data=dat4,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

Warning: There were 1068 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.

Warning: Examine the pairs() plot to diagnose sampling problems

explore the range of posterior predictions resulting from the priors alone

dat4b.brm |>
  conditional_effects() |>
  plot(points = TRUE) |> _[[1]] +
  scale_y_log10()

Warning in scale_y_log10(): log-10 transformation introduced infinite values.

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat4b.brm2 <- update(dat4b.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat4b.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat4b.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "shape"               
 [4] "Intercept"            "prior_Intercept"      "prior_b"             
 [7] "prior_shape"          "lprior"               "lp__"                
[10] "accept_stat__"        "stepsize__"           "treedepth__"         
[13] "n_leapfrog__"         "divergent__"          "energy__"

dat4b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot()

dat4b.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "shape"               
 [4] "Intercept"            "prior_Intercept"      "prior_b"             
 [7] "prior_shape"          "lprior"               "lp__"                
[10] "accept_stat__"        "stepsize__"           "treedepth__"         
[13] "n_leapfrog__"         "divergent__"          "energy__"

dat4b.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& NB(\lambda_i, \phi)\\ log(\lambda_i) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate:

$\beta_0$: Normal prior centred at 2.07 with a variance of 0.93

mean:

dat4$y |>
    log() |> 
    median() |>
    round(2)

[1] 2.07

variance:

dat4$y |>
    log() |> 
    mad() |>
    round(2)

[1] 0.93

$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.93
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
dat4$y |> log() |> mad() |> round(2)
```
```
[1] 0.93
```

priors <- prior(normal(2.07,0.93), class = "Intercept") +
    prior(student_t(3, 0, 1.43), class = "b") +
    prior(gamma(0.01, 0.01), class = "shape")

start by fitting the model and sampling from the priors only

dat4c.form <- bf(y ~ scale(x), family = negbinomial(link = "log"))
dat4c.brm <- brm(dat4c.form,
                data=dat4,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

Warning: There were 1286 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.

Warning: Examine the pairs() plot to diagnose sampling problems

Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess

Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess

explore the range of posterior predictions resulting from the priors alone

dat4c.brm |>
  conditional_effects() |>
  plot(points = TRUE) |> _[[1]] +
  scale_y_log10()

Warning in scale_y_log10(): log-10 transformation introduced infinite values.

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat4c.brm2 <- update(dat4c.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat4c.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat4c.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_scalex"        "shape"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_shape"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"

dat3c.brm2 |> hypothesis("scalex = 0") |> plot()

dat4c.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_scalex"        "shape"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_shape"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"

dat4c.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + \beta_1 x_i\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Binary models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:

the observed response values are only ever either 0 or 1
a linear model is exploring whether the probability of a 1 changes from high to low or low to high according to the linear predictor
the switch in probability is likely to be somewhere near the middle of the $x$ range
with a centered predictor, the mean response is expected to be approximately 0.5
on a logit (log odds) scale, this corresponds to a value of 0.
on a logit (log odds) scale, values of -3 and 3 are considered very wide
on a logit scale, values between -1 and 1 are reasonable.

So the following priors might be appropriate:

$\beta_0$: Normal prior centred at 0 with a variance of 1
$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1

priors <- prior(normal(0, 1), class = "Intercept") +
    prior(student_t(3, 0, 1), class = "b")

start by fitting the model and sampling from the priors only

dat5a.form <- bf(y | trials(1) ~ x, family = binomial(link = "logit"))
dat5a.brm <- brm(dat5a.form,
                data=dat5,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

For Binary data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.

dat5a.brm |>
  conditional_effects(method = "posterior_linpred")

dat5a.brm |>
  conditional_effects() |>
  plot(points = TRUE)

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat5a.brm2 <- update(dat5a.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat5a.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat5a.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_x"             "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat5a.brm2 |> hypothesis("x = 0") |> plot()

dat5a.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_x"             "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat5a.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1 -\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]

$\beta_0$: Normal prior centred at 0 with a variance of 1
$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1

priors <- prior(normal(0, 1), class = "Intercept") +
    prior(student_t(3, 0, 1), class = "b")

start by fitting the model and sampling from the priors only

dat5b.form <- bf(y | trials(1) ~ scale(x, scale = FALSE), family = binomial(link = "logit"))
dat5b.brm <- brm(dat5b.form,
                data=dat5,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

dat5b.brm |>
  conditional_effects(method = "posterior_linpred")

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat5b.brm2 <- update(dat5b.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat5b.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat5b.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "Intercept"           
 [4] "prior_Intercept"      "prior_b"              "lprior"              
 [7] "lp__"                 "accept_stat__"        "stepsize__"          
[10] "treedepth__"          "n_leapfrog__"         "divergent__"         
[13] "energy__"

dat5b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot()

dat5b.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "Intercept"           
 [4] "prior_Intercept"      "prior_b"              "lprior"              
 [7] "lp__"                 "accept_stat__"        "stepsize__"          
[10] "treedepth__"          "n_leapfrog__"         "divergent__"         
[13] "energy__"

dat5b.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1 -\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} \]

$\beta_0$: Normal prior centred at 0 with a variance of 1
$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 1

priors <- prior(normal(0, 1), class = "Intercept") +
    prior(student_t(3, 0, 1), class = "b")

start by fitting the model and sampling from the priors only

dat5c.form <- bf(y | trials(1) ~ scale(x), family = binomial(link = "logit"))
dat5c.brm <- brm(dat5c.form,
                data=dat5,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

dat5c.brm |>
  conditional_effects(method = "posterior_linpred")

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat5c.brm2 <- update(dat5c.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat5c.brm2 |>
    conditional_effects() |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat5c.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat5c.brm2 |> hypothesis("scalex = 0") |> plot()

dat5c.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat5c.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + \beta_1 x_i\\ \end{align} \]

When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful:

the expected $\pi$ values are only ever between 0 or 1
on a logit (log odds) scale, values of -3 and 3 are considered very wide
on a logit scale, values between -1 and 1 are reasonable.

So the following priors might be appropriate:

$\beta_0$: Normal prior centred at 0 with a variance of 0

mean:

(dat6$count/dat6$total) |>
    log() |> 
    median() |>
    round(2)

[1] -0.22

variance:

(dat6$count/dat6$total) |>
    log() |> 
    mad() |>
    round(2)

[1] 0.33

$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.51
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
dat6 |>
    mutate(across(c(y, x), log)) |>
    summarise(across(c(y, x), mad)) |>
    mutate(round(y / x, 2))
```
```
          y        x round(y/x, 2)
1 0.3308326 0.648985          0.51
```

priors <- prior(normal(-0.22, 0.33), class = "Intercept") +
    prior(student_t(3, 0, 0.51), class = "b")

start by fitting the model and sampling from the priors only

dat6a.form <- bf(count | trials(total) ~ x, family = binomial(link = "logit"))
dat6a.brm <- brm(dat6a.form,
                data=dat6,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

For Binary data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.

dat6a.brm |>
  conditional_effects(method = "posterior_linpred")

Setting all 'trials' variables to 1 by default if not specified otherwise.

dat6a.brm |>
  conditional_effects(conditions = data.frame(total = dat6$total)) |>
  plot(points = TRUE)

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat6a.brm2 <- update(dat6a.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat6a.brm2 |>
    conditional_effects(conditions = data.frame(total = dat6$total)) |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat6a.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_x"             "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat6a.brm2 |> hypothesis("x = 0") |> plot()

dat6a.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_x"             "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat6a.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} \]

the expected $\pi$ values are only ever between 0 or 1
on a logit (log odds) scale, values of -3 and 3 are considered very wide
on a logit scale, values between -1 and 1 are reasonable.

So the following priors might be appropriate:

$\beta_0$: Normal prior centred at 0 with a variance of 0

mean:

(dat6$count/dat6$total) |>
    log() |> 
    median() |>
    round(2)

[1] -0.22

variance:

(dat6$count/dat6$total) |>
    log() |> 
    mad() |>
    round(2)

[1] 0.33

$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.51
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
dat6 |>
    mutate(across(c(y, x), log)) |>
    summarise(across(c(y, x), mad)) |>
    mutate(round(y / x, 2))
```
```
          y        x round(y/x, 2)
1 0.3308326 0.648985          0.51
```

priors <- prior(normal(-0.22, 0.33), class = "Intercept") +
    prior(student_t(3, 0, 0.51), class = "b")

start by fitting the model and sampling from the priors only

dat6b.form <- bf(count | trials(total) ~ scale(x, scale = FALSE), 
  family = binomial(link = "logit"))
dat6b.brm <- brm(dat6b.form,
                data=dat6,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

For Binomial data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.

dat6b.brm |>
  conditional_effects(method = "posterior_linpred")

Setting all 'trials' variables to 1 by default if not specified otherwise.

dat6b.brm |>
  conditional_effects(conditions = data.frame(total = dat6$total)) |>
  plot(points = TRUE)

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat6b.brm2 <- update(dat6b.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat6b.brm2 |>
    conditional_effects(conditions = data.frame(total = dat6$total)) |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat6b.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "Intercept"           
 [4] "prior_Intercept"      "prior_b"              "lprior"              
 [7] "lp__"                 "accept_stat__"        "stepsize__"          
[10] "treedepth__"          "n_leapfrog__"         "divergent__"         
[13] "energy__"

dat6b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot()

dat6b.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "Intercept"           
 [4] "prior_Intercept"      "prior_b"              "lprior"              
 [7] "lp__"                 "accept_stat__"        "stepsize__"          
[10] "treedepth__"          "n_leapfrog__"         "divergent__"         
[13] "energy__"

dat6b.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

\[ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\sigma\\ \end{align} \]

the expected $\pi$ values are only ever between 0 or 1
on a logit (log odds) scale, values of -3 and 3 are considered very wide
on a logit scale, values between -1 and 1 are reasonable.

So the following priors might be appropriate:

$\beta_0$: Normal prior centred at 0 with a variance of 0

mean:

(dat6$count/dat6$total) |>
    log() |> 
    median() |>
    round(2)

[1] -0.22

variance:

(dat6$count/dat6$total) |>
    log() |> 
    mad() |>
    round(2)

[1] 0.33

$\beta_1$: t distribution (3 degrees of freedom) prior centred at 0 with a variance of 0.33
- mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0
- variance:
```
(dat6$count / dat6$total) |>
    log() |>
    mad() |>
    round(2)
```
```
[1] 0.33
```

priors <- prior(normal(-0.22, 0.33), class = "Intercept") +
    prior(student_t(3, 0, 0.33), class = "b")

start by fitting the model and sampling from the priors only

dat6c.form <- bf(count | trials(total) ~ scale(x), 
  family = binomial(link = "logit"))
dat6c.brm <- brm(dat6c.form,
                data=dat6,
                prior=priors,
                sample_prior = 'only', 
                iter = 5000,
                warmup = 1000,
                chains = 3, cores = 3,
                thin = 5,
                backend = "rstan",
                refresh = 0)

Compiling Stan program...

Start sampling

explore the range of posterior predictions resulting from the priors alone

For Binomial data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough.

dat6c.brm |>
  conditional_effects(method = "posterior_linpred")

Setting all 'trials' variables to 1 by default if not specified otherwise.

dat6c.brm |>
  conditional_effects(conditions = data.frame(total = dat6$total)) |>
  plot(points = TRUE)

Conclusions:

the grey ribbon above represents the credible range of the posterior predictions
this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow
the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data.

now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates)

dat6c.brm2 <- update(dat6c.brm,
    sample_prior = "yes"
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

re-explore the range of posterior predictions resulting from the fitted model

dat6c.brm2 |>
    conditional_effects(conditions = data.frame(total = dat6$total)) |>
    plot(points = TRUE)

Conclusions:

the range of the posteriors are now substantially reduced (now that the model includes both data and priors)
this suggests that the patterns are being driven predominantly by the data

compare the priors and posteriors to further confirm that the priors are not overly influential

dat6c.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat6c.brm2 |> hypothesis("scalex = 0") |> plot()

dat6c.brm2 |> tidybayes::get_variables()

 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat6c.brm2 |> SUYR_prior_and_posterior()

Conclusions:

each of the priors are substantially wider than the posteriors
the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes)
the priors are simply regularising the parameters such that they are only sampled from plausible regions

8 MCMC sampling diagnostics

MCMC sampling behaviour

Since the purpose of the MCMC sampling is to estimate the posterior of an unknown joint likelihood, it is important that we explore a range of diagnostics designed to help identify when the resulting likelihood might not be accurate.

traceplots - plots of the individual draws in sequence. Traces that resemble noise suggest that all likelihood features are likely to have be traversed. Obvious steps or blocks of noise are likely to represent distinct features and could imply that there are yet other features that have not yet been traversed - necessitating additional iterations. Furthermore, each chain should be indistinguishable from the others

autocorrelation function - plots of the degree of correlation between pairs of draws for a range of lags (distance along the chains). High levels of correlation (after a lag of 0, which is correlating each draw with itself) suggests a lack of independence between the draws and that therefore, summaries such as mean and median will be biased estimates. Ideally, all non-zero lag correlations should be less than 0.2. The left hand figure below demonstrates a clear pattern of autocorrelation, whereas the right hand figure shows no autocorrelation.

convergence diagnostics - there are a range of diagnostics aimed at exploring whether the multiple chains are likely to have converged upon similar posteriors
- R hat - this metric compares between and within chain model parameter estimates, with the expectation that if the chains have converged, the between and within rank normalised estimates should be very similar (and Rhat should be close to 1). The more one chains deviates from the others, the higher the Rhat value. Values less than 1.05 are considered evidence of convergence.
- Bulk ESS - this is a measure of the effective sample size from the whole (bulk) of the posterior and is a good measure of the sampling efficiency of draws across the entire posterior
- Tail ESS - this is a measure of the effective sample size from the 5% and 95% quantiles (tails) of the posterior and is a good measure of the sampling efficiency of draws from the tail (areas of the posterior with least support and where samplers can get stuck).

There are numerous packages in R that support MCMC diagnostics. Popular packages include:

bayesplot
rstan
ggmcmcm

Some of the most useful diagnostics are presented in the following table.

Package	Description	function	rstanarm	brms
bayesplot	Traceplot	`mcmc_trace`	`plot(mod, plotfun='trace')`	`mcmc_plot(mod, type='trace')`
	Density plot	`mcmc_dens`	`plot(mod, plotfun='dens')`	`mcmc_plot(mod, type='dens')`
	Density & Trace	`mcmc_combo`	`plot(mod, plotfun='combo')`	`mcmc_plot(mod, type='combo')`
	ACF	`mcmc_acf_bar`	`plot(mod, plotfun='acf_bar')`	`mcmc_plot(mod, type='acf_bar')`
	Rhat hist	`mcmc_rhat_hist`	`plot(mod, plotfun='rhat_hist')`	`mcmc_plot(mod, type='rhat_hist')`
	No. Effective	`mcmc_neff_hist`	`plot(mod, plotfun='neff_hist')`	`mcmc_plot(mod, type='neff_hist')`
rstan	Traceplot	`stan_trace`	`stan_trace(mod)`	`stan_trace(mod)`
	ACF	`stan_ac`	`stan_ac(mod)`	`stan_ac(mod)`
	Rhat	`stan_rhat`	`stan_rhat(mod)`	`stan_rhat(mod)`
	No. Effective	`stan_ess`	`stan_ess(mod)`	`stan_ess(mod)`
	Density plot	`stan_dens`	`stan_dens(mod)`	`stan_dens(mod)`
ggmcmc	Traceplot	`ggs_traceplot`	`ggs_traceplot(ggs(mod))`	`ggs_traceplot(ggs(mod))`
	ACF	`ggs_autocorrelation`	`ggs_autocorrelation(ggs(mod))`	`ggs_autocorrelation(ggs(mod))`
	Rhat	`ggs_Rhat`	`ggs_Rhat(ggs(mod))`	`ggs_Rhat(ggs(mod))`
	No. Effective	`ggs_effective`	`ggs_effective(ggs(mod))`	`ggs_effective(ggs(mod))`
	Cross correlation	`ggs_crosscorrelation`	`ggs_crosscorrelation(ggs(mod))`	`ggs_crosscorrelation(ggs(mod))`
	Scale reduction	`ggs_grb`	`ggs_grb(ggs(mod))`	`ggs_grb(ggs(mod))`

I personally prefer the rstan version of plots and thus these are the ones I will showcase.

Note

Bayesian samplers involve many calls to randomisation functions. As a result, the estimates will vary slightly each time the routines are run. You should expect that the outputs that you obtain will differ slightly from those that I am displaying. Nevertheless, the main conclusions should remain robust across subsequent runs.

Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

dat1a.brm2$fit |> stan_trace()

dat1a.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat1a.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

dat1a.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

dat1a.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat1b.brm2$fit |> stan_trace()

dat1b.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat1b.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat1b.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat1b.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat1c.brm2$fit |> stan_trace()

dat1c.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat1c.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat1c.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat1c.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat2a.brm2$fit |> stan_trace()

dat2a.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat2a.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat2a.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat2a.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat2b.brm2$fit |> stan_trace()

dat2b.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat2b.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat2b.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat2b.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat3a.brm2$fit |> stan_trace()

dat3a.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat3a.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat3a.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat3a.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat3b.brm2$fit |> stan_trace()

dat3b.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat3b.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat3b.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat3b.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat3c.brm2$fit |> stan_trace()

dat3c.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat3c.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat3c.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat3c.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat4a.brm2$fit |> stan_trace()

dat4a.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat4a.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat4a.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat4a.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat4b.brm2$fit |> stan_trace()

dat4b.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat4b.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat4b.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat4b.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat4c.brm2$fit |> stan_trace()

dat4c.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat4c.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat4c.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat4c.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat5a.brm2$fit |> stan_trace()

dat5a.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat5a.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat5a.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat5a.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat5b.brm2$fit |> stan_trace()

dat5b.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat5b.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat5b.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat5b.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat5c.brm2$fit |> stan_trace()

dat5c.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat5c.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat5c.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat5c.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat6a.brm2$fit |> stan_trace()

dat6a.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat6a.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat6a.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat6a.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat6b.brm2$fit |> stan_trace()

dat6b.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat6b.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat6b.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat6b.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

dat6c.brm2$fit |> stan_trace()

dat6c.brm2$fit |> stan_trace(inc_warmup = TRUE)

Conclusions:

- the chains appear well mixed and very similar

dat6c.brm2$fit |> stan_ac()

Conclusions:

there is no evidence of autocorrelation in the MCMC samples

dat6c.brm2$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

all Rhat values are below 1.05, suggesting the chains have converged.

If the ratios are low, tightening the priors may help.

dat6c.brm2$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusions:

ratios are all very high

Conclusions:

all the diagnostics appear reasonable
we can conclude that the chains are all well mixed and have converged on a stable posterior.

9 Model validation

Model validation involves exploring the model diagnostics and fit to ensure that the model is broadly appropriate for the data. As such, exploration of the residuals should be routine.

For more complex models (those that contain multiple effects, it is also advisable to plot the residuals against each of the individual predictors. For sampling designs that involve sample collection over space or time, it is also a good idea to explore whether there are any temporal or spatial patterns in the residuals.

There are numerous situations (e.g. when applying specific variance-covariance structures to a model) where raw residuals do not reflect the interior workings of the model. Typically, this is because they do not take into account the variance-covariance matrix or assume a very simple variance-covariance matrix. Since the purpose of exploring residuals is to evaluate the model, for these cases, it is arguably better to draw conclusions based on standardized (or studentized) residuals.

Unfortunately the definitions of standardised and studentised residuals appears to vary and the two terms get used interchangeably. I will adopt the following definitions:

Standardized residuals: the raw residuals divided by the true standard deviation of the residuals (which of course is rarely known).
Studentized residuals: the raw residuals divided by the standard deviation of the residuals. Note that externally studentised residuals are calculated by dividing the raw residuals by a unique standard deviation for each observation that is calculated from regressions having left each successive observation out.
Pearson residuals: the raw residuals divided by the standard deviation of the response variable.

The mark of a good model is being able to predict well. In an ideal world, we would have sufficiently large sample size as to permit us to hold a fraction (such as 25%) back thereby allowing us to train the model on 75% of the data and then see how well the model can predict the withheld 25%. Unfortunately, such a luxury is still rare in ecology.

The next best option is to see how well the model can predict the observed data. Models tend to struggle most with the extremes of trends and have particular issues when the extremes approach logical boundaries (such as zero for count data and standard deviations). We can use the fitted model to generate random predicted observations and then explore some properties of these compared to the actual observed data.

Package	Description	function	rstanarm	brms
bayesplot	Density overlay	`ppc_dens_overlay`	`pp_check(mod, plotfun='dens_overlay')`	`pp_check(mod, type='dens_overlay')`
	Obs vs Pred error	`ppc_error_scatter_avg`	`pp_check(mod, plotfun='error_scatter_avg')`	`pp_check(mod, type='error_scatter_avg')`
	Pred error vs x	`ppc_error_scatter_avg_vs_x`	`pp_check(mod, x=, plotfun='error_scatter_avg_vs_x')`	`pp_check(mod, x=, type='error_scatter_avg_vs_x')`
	Preds vs x	`ppc_intervals`	`pp_check(mod, x=, plotfun='intervals')`	`pp_check(mod, x=, type='intervals')`
	Partial plot	`ppc_ribbon`	`pp_check(mod, x=, plotfun='ribbon')`	`pp_check(mod, x=, type='ribbon')`

More PPC modules (functions) available

available_ppc()

bayesplot PPC module:
  ppc_bars
  ppc_bars_grouped
  ppc_boxplot
  ppc_dens
  ppc_dens_overlay
  ppc_dens_overlay_grouped
  ppc_ecdf_overlay
  ppc_ecdf_overlay_grouped
  ppc_error_binned
  ppc_error_hist
  ppc_error_hist_grouped
  ppc_error_scatter
  ppc_error_scatter_avg
  ppc_error_scatter_avg_grouped
  ppc_error_scatter_avg_vs_x
  ppc_freqpoly
  ppc_freqpoly_grouped
  ppc_hist
  ppc_intervals
  ppc_intervals_grouped
  ppc_km_overlay
  ppc_km_overlay_grouped
  ppc_loo_intervals
  ppc_loo_pit_overlay
  ppc_loo_pit_qq
  ppc_loo_ribbon
  ppc_pit_ecdf
  ppc_pit_ecdf_grouped
  ppc_ribbon
  ppc_ribbon_grouped
  ppc_rootogram
  ppc_scatter
  ppc_scatter_avg
  ppc_scatter_avg_grouped
  ppc_stat
  ppc_stat_2d
  ppc_stat_freqpoly
  ppc_stat_freqpoly_grouped
  ppc_stat_grouped
  ppc_violin_grouped

Note

resid <- resid(dat1a.brm2)[, "Estimate"]
fit <- fitted(dat1a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat1a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat$x))

Conclusions:

there does not appear to be any pattern in the residuals

Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.

Density overlay

These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.

dat1a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat1a.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat1a.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat1a.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat1a.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat1a.resids <- make_brms_dharma_res(dat1a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat1a.resids)) +
    wrap_elements(~ plotResiduals(dat1a.resids)) +
    wrap_elements(~ testDispersion(dat1a.resids)) +
    plot_layout(nrow = 1)

Note

If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure.

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat1b.brm2)[, "Estimate"]
fit <- fitted(dat1b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat1b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat$x))

Conclusions:

there does not appear to be any pattern in the residuals

Density overlay

dat1b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat1b.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat1b.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat1b.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat1b.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat1b.resids <- make_brms_dharma_res(dat1b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat1b.resids)) +
    wrap_elements(~ plotResiduals(dat1b.resids)) +
    wrap_elements(~ testDispersion(dat1b.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat1c.brm2)[, "Estimate"]
fit <- fitted(dat1c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat1c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat$x))

Conclusions:

there does not appear to be any pattern in the residuals

Density overlay

dat1c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat1c.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat1c.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat1c.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat1c.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat1c.resids <- make_brms_dharma_res(dat1c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat1c.resids)) +
    wrap_elements(~ plotResiduals(dat1c.resids)) +
    wrap_elements(~ testDispersion(dat1c.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat2a.brm2)[, "Estimate"]
fit <- fitted(dat2a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat2a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat2$x))

Conclusions:

there does not appear to be any pattern in the residuals

Density overlay

dat2a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat2a.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat2a.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat2a.resids <- make_brms_dharma_res(dat2a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat2a.resids)) +
    wrap_elements(~ plotResiduals(dat2a.resids)) +
    wrap_elements(~ testDispersion(dat2a.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat2b.brm2)[, "Estimate"]
fit <- fitted(dat2b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat2b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat2$x))

Conclusions:

there does not appear to be any pattern in the residuals

Density overlay

dat2b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat2b.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat2b.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat2b.resids <- make_brms_dharma_res(dat2b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat2b.resids)) +
    wrap_elements(~ plotResiduals(dat2b.resids)) +
    wrap_elements(~ testDispersion(dat2b.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat3a.brm2)[, "Estimate"]
fit <- fitted(dat3a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat3a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat3$x))

conclusions:

there does not appear to be any pattern in the residuals

density overlay

dat3a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat3a.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat3a.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat3a.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat3a.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat3a.resids <- make_brms_dharma_res(dat3a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat3a.resids)) +
    wrap_elements(~ plotResiduals(dat3a.resids)) +
    wrap_elements(~ testDispersion(dat3a.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat3b.brm2)[, "Estimate"]
fit <- fitted(dat3b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat3b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat3$x))

conclusions:

there does not appear to be any pattern in the residuals

density overlay

dat3b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat3b.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat3b.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat3b.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat3b.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat3b.resids <- make_brms_dharma_res(dat3b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat3b.resids)) +
    wrap_elements(~ plotResiduals(dat3b.resids)) +
    wrap_elements(~ testDispersion(dat3b.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat3c.brm2)[, "Estimate"]
fit <- fitted(dat3c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat3c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat3$x))

conclusions:

there does not appear to be any pattern in the residuals

density overlay

dat3c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat3c.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat3c.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat3c.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat3c.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat3c.resids <- make_brms_dharma_res(dat3c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat3c.resids)) +
    wrap_elements(~ plotResiduals(dat3c.resids)) +
    wrap_elements(~ testDispersion(dat3c.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat4a.brm2)[, "Estimate"]
fit <- fitted(dat4a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat4a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat4$x))

conclusions:

there does not appear to be any pattern in the residuals

density overlay

dat4a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat4a.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat4a.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat4a.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat4a.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat4a.resids <- make_brms_dharma_res(dat4a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat4a.resids)) +
    wrap_elements(~ plotResiduals(dat4a.resids)) +
    wrap_elements(~ testDispersion(dat4a.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat4b.brm2)[, "Estimate"]
fit <- fitted(dat4b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat4b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat4$x))

conclusions:

there does not appear to be any pattern in the residuals

density overlay

dat4b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat4b.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat4b.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat4b.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat4b.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat4b.resids <- make_brms_dharma_res(dat4b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat4b.resids)) +
    wrap_elements(~ plotResiduals(dat4b.resids)) +
    wrap_elements(~ testDispersion(dat4b.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat4c.brm2)[, "Estimate"]
fit <- fitted(dat4c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat4c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat4$x))

conclusions:

there does not appear to be any pattern in the residuals

density overlay

dat4c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat4c.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat4c.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat4c.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat4c.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat4c.resids <- make_brms_dharma_res(dat4c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat4c.resids)) +
    wrap_elements(~ plotResiduals(dat4c.resids)) +
    wrap_elements(~ testDispersion(dat4c.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat5a.brm2)[, "Estimate"]
fit <- fitted(dat5a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat5a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat3$x))

conclusions:

the above plots are almost impossible to interpret for binary data.
they will always feature two curved lines (one for the zeros, the other for the ones)
it is virtually impossible to diagnose any issues from such plots.

density overlay

dat5a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data
note that these density plots are going to be too crude to be completely useful
all the mass should be at either 0 or 1

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat5a.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.
this sort of plot is of very little value for binary data

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat5a.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data
this sort of plot is of very little value for binary data

Ribbon

These are just an alternative way of expressing the interval plot.

dat5a.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data
this sort of plot is of very little value for binary data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat5a.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data

dat5a.resids <- make_brms_dharma_res(dat5a.brm2, integerResponse = FALSE)

Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.

wrap_elements(~ testUniformity(dat5a.resids)) +
    wrap_elements(~ plotResiduals(dat5a.resids, quantreg = FALSE)) +
    wrap_elements(~ testDispersion(dat5a.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat5b.brm2)[, "Estimate"]
fit <- fitted(dat5b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat5b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat3$x))

conclusions:

the above plots are almost impossible to interpret for binary data.
they will always feature two curved lines (one for the zeros, the other for the ones)
it is virtually impossible to diagnose any issues from such plots.

density overlay

dat5b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data
note that these density plots are going to be too crude to be completely useful
all the mass should be at either 0 or 1

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat5b.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.
this sort of plot is of very little value for binary data

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat5b.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data
this sort of plot is of very little value for binary data

Ribbon

These are just an alternative way of expressing the interval plot.

dat5b.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data
this sort of plot is of very little value for binary data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat5b.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data

dat5b.resids <- make_brms_dharma_res(dat5b.brm2, integerResponse = FALSE)

Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.

wrap_elements(~ testUniformity(dat5b.resids)) +
    wrap_elements(~ plotResiduals(dat5b.resids, quantreg = FALSE)) +
    wrap_elements(~ testDispersion(dat5b.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat5c.brm2)[, "Estimate"]
fit <- fitted(dat5c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat5c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat3$x))

conclusions:

the above plots are almost impossible to interpret for binary data.
they will always feature two curved lines (one for the zeros, the other for the ones)
it is virtually impossible to diagnose any issues from such plots.

density overlay

dat5c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data
note that these density plots are going to be too crude to be completely useful
all the mass should be at either 0 or 1

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat5c.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.
this sort of plot is of very little value for binary data

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat5c.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data
this sort of plot is of very little value for binary data

Ribbon

These are just an alternative way of expressing the interval plot.

dat5c.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data
this sort of plot is of very little value for binary data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat5c.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data

dat5c.resids <- make_brms_dharma_res(dat5c.brm2, integerResponse = FALSE)

Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.

wrap_elements(~ testUniformity(dat5c.resids)) +
    wrap_elements(~ plotResiduals(dat5c.resids, quantreg = FALSE)) +
    wrap_elements(~ testDispersion(dat5c.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat6a.brm2)[, "Estimate"]
fit <- fitted(dat6a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat6a.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat6$x))

conclusions:

there does not appear to be any pattern in the residuals

density overlay

dat6a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat6a.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat6a.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat6a.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat6a.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat6a.resids <- make_brms_dharma_res(dat6a.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat6a.resids)) +
    wrap_elements(~ plotResiduals(dat6a.resids)) +
    wrap_elements(~ testDispersion(dat6a.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat6b.brm2)[, "Estimate"]
fit <- fitted(dat6b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat6b.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat6$x))

conclusions:

there does not appear to be any pattern in the residuals

density overlay

dat6b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat6b.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat6b.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat6b.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat6b.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat6b.resids <- make_brms_dharma_res(dat6b.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat6b.resids)) +
    wrap_elements(~ plotResiduals(dat6b.resids)) +
    wrap_elements(~ testDispersion(dat6b.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

resid <- resid(dat6c.brm2)[, "Estimate"]
fit <- fitted(dat6c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = fit))

We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available).

resid <- resid(dat6c.brm2)[, "Estimate"]
ggplot() +
  geom_point(data = NULL, aes(y = resid, x = dat6$x))

conclusions:

there does not appear to be any pattern in the residuals

density overlay

dat6c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100)

Conclusions:

the model draws appear to be consistent with the observed data

Error scatter

These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot.

dat6c.brm2 |> pp_check(type = 'error_scatter_avg')

Using all posterior draws for ppc type 'error_scatter_avg' by default.

Conclusions:

there is no obvious pattern in the residuals.

Intervals

These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

dat6c.brm2 |> pp_check(type = 'intervals', x = "x")

Using all posterior draws for ppc type 'intervals' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Ribbon

These are just an alternative way of expressing the interval plot.

dat6c.brm2 |> pp_check(type = 'ribbon', x = "x")

Using all posterior draws for ppc type 'ribbon' by default.

Conclusions:

the posterior predictions are not inconsistent with the observed data

Using shiny to explore a wide range of MCMC and validation diagnostics

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

library(shinystan)
launch_shinystan(dat6c.brm2)

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

dat6c.resids <- make_brms_dharma_res(dat6c.brm2, integerResponse = FALSE)
wrap_elements(~ testUniformity(dat6c.resids)) +
    wrap_elements(~ plotResiduals(dat6c.resids)) +
    wrap_elements(~ testDispersion(dat6c.resids)) +
    plot_layout(nrow = 1)

Note

To address this, you can either:

break the multi-panel figure up into four separate figures by removing the outer wrap_elements() and plot_layout() functions.
copy the above code into the console and view in a larger graphics device

Conclusions:

the Q-Q plot looks reasonable (points broadly follow the angled line)
there are no flagged issues with the
- KS test: conformity to the nominated distribution (family)
- Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals
- Outlier test: the influence of each observation
there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3
the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model)

Conclusions: - there is no evidence of a lack of fit - the model is likely to be reliable

10 Partial effects plots

Prior to exploring the modelled numerical estimates, it is worth reviewing simple plots of the predicted trends associated with each predictor. Importantly, they typically express the trends on the scale of the response, although for some, it is possible to force the trends to be expressed on the link scale. Such plots provides a final visual check of whether the model has yielded sensible outcomes. Furthermore, they usually assist in the interpretation of the major estimated parameters.

Conditional effects

dat1a.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat1a.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat1b.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat1b.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Notice that although we had centered the predictor, because we did so in the model formula, conditional_effects is able to backtransform $x$ onto the original scale when producing the partial plot.

Conditional effects

dat1c.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat1c.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat2a.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat2a.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat2b.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat2b.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat3a.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat3a.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat3b.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat3b.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat3c.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat3c.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat4a.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat4a.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat4b.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat4b.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat4c.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat4c.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat5a.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat5a.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat5b.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat5b.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat5c.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

#OR
dat5c.brm2 |>
    conditional_effects(spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat6a.brm2 |>
  conditional_effects(conditions = data.frame(total = dat6$total)) |>
  plot(points = TRUE)

#OR
dat6a.brm2 |>
    conditional_effects(conditions = data.frame(total = dat6$total), 
    spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat6b.brm2 |>
  conditional_effects(conditions = data.frame(total = dat6$total)) |>
  plot(points = TRUE)

#OR
dat6b.brm2 |>
    conditional_effects(conditions = data.frame(total = dat6$total), 
    spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Conditional effects

dat6c.brm2 |>
  conditional_effects(conditions = data.frame(total = dat6$total)) |>
  plot(points = TRUE)

#OR
dat6c.brm2 |>
    conditional_effects(conditions = data.frame(total = dat6$total), 
    spaghetti = TRUE, ndraws = 200) |>
    plot(points = TRUE)

Notice that although we had centered and scaled the predictor, because we did so in the model formula, conditional_effects is able to backtransform $x$ onto the original scale when producing the partial plot.

11 Model investigation

Rather than simply return point estimates of each of the model parameters, Bayesian analyses capture the full posterior of each parameter. These are typically stored within the list structure of the output object.

As with most statistical routines, the overloaded summary() function provides an overall summary of the model parameters. Typically, the summaries will include the means / medians along with credibility intervals and perhaps convergence diagnostics (such as R hat). However, more thorough investigation and analysis of the parameter posteriors requires access to the full posteriors.

There is currently a plethora of functions for extracting the full posteriors from models. In part, this is a reflection of a rapidly evolving space with numerous packages providing near equivalent functionality (it should also be noted, that over time, many of the functions have been deprecated due to inconsistencies in their names). Broadly speaking, the functions focus on draws from the posterior of either the parameters (intercept, slope, standard deviation etc), linear predictor, expected values or predicted values. The distinction between the latter three are highlighted in the following table.

Property	Description
linear predictors	values predicted on the link scale
expected values	predictions (on response scale) without residual error (predicting expected mean outcome(s))
predicted values	predictions (on response scale) that incorporate residual error
fitted values	predictions on the response scale

dat1a.brm2 |> summary()

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: dat (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.27      0.38    -0.46     1.06 1.00     2134     2352
x            -0.08      0.46    -0.97     0.82 1.00     2459     2330

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.12      0.36     0.66     2.07 1.00     2402     2248

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$, the expected value of $y$ is 0.266 and we are 95% confident that the true value is between -0.461 and 1.057. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) -0.08 units and we are 95% confident that this change is between -0.967 and 0.816
sigma is estimated to be 1.12

Note, the estimates are means and quantiles.

As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat1a.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 9 × 8
  variable          median     lower   upper  rhat length ess_bulk ess_tail
  <chr>              <dbl>     <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept       0.262   -0.501     1.01   1.00   2400    2134.    2352.
2 b_x              -0.0803  -0.973     0.806  1.00   2400    2459.    2330.
3 sigma             1.03     0.588     1.84   1.00   2400    2402.    2248.
4 Intercept         0.277   -0.443     1.04   1.00   2400    2085.    2348.
5 prior_Intercept  30.9     -0.201    59.8    1.00   2400    2231.    2286.
6 prior_b           0.0169 -13.1      11.0    1.00   2400    1968.    2220.
7 prior_sigma      12.0      0.00537  46.8    1.00   2400    2399.    2330.
8 lprior          -11.2    -11.3     -11.1    1.00   2400    2148.    2103.
9 lp__            -25.1    -28.6     -23.7    1.00   2400    2241.    2233.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 0.262 and we are 95% confident that the true value is between -0.501 and 1.013. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) -0.08 units and we are 95% confident that this change is between -0.973 and 0.806
sigma is estimated to be 1.03

As yet a more flexible alternative, if we first extract the full posterior draws (with as_draws_df()), we can then use the various tidyverse functions to focus on just the most ecologically interpretable parameters before summarising.

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

dat1a.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 3 × 8
  variable     median  lower upper  rhat length ess_bulk ess_tail
  <chr>         <dbl>  <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept  0.262  -0.501 1.01   1.00   2400    2100.    2344.
2 b_x         -0.0803 -0.973 0.806  1.00   2400    2454.    2322.
3 sigma        1.03    0.588 1.84   1.00   2400    2402.    2236.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 0.262 and we are 95% confident that the true value is between -0.501 and 1.013. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) -0.08 units and we are 95% confident that this change is between -0.973 and 0.806
sigma is estimated to be 1.03

dat1a.brm2 |>
    gather_draws(b_Intercept, b_x, sigma) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat1a.brm2 |>
  gather_draws(b_Intercept, b_x, sigma) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat1a.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

           y         ymin     ymax .width .point .interval
1 0.06517101 2.242159e-09 0.313515   0.95 median      hdci

Conclusions:

6.517% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 0% and 31.352%

dat1b.brm2 |> summary()

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ scale(x, scale = FALSE) 
   Data: dat (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              0.28      0.36    -0.40     0.98 1.00     2309     2290
scalexscaleEQFALSE    -0.09      0.46    -0.97     0.85 1.00     2380     2251

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.10      0.34     0.66     1.94 1.00     2363     2409

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$ (its average since it is centered), the expected value of $y$ is 0.283 and we are 95% confident that the true value is between -0.404 and 0.981. So $y$ is expected to be 0.283 at the average $x$.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) -0.089 units and we are 95% confident that this change is between -0.971 and 0.849
sigma is estimated to be 1.1

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat1b.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 9 × 8
  variable               median     lower   upper  rhat length ess_bulk ess_tail
  <chr>                   <dbl>     <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            0.277   -0.437     0.938  1.00   2400    2309.    2290.
2 b_scalexscaleEQFALSE  -0.0841  -0.997     0.803  1.00   2400    2381.    2251.
3 sigma                  1.03     0.596     1.76   1.00   2400    2364.    2409.
4 Intercept              0.277   -0.437     0.938  1.00   2400    2309.    2290.
5 prior_Intercept       31.7      1.66     60.9    1.00   2400    2219.    2150.
6 prior_b               -0.0664 -12.9      13.2    1.00   2400    2408.    2328.
7 prior_sigma           11.2      0.00997  49.3    1.00   2400    2373.    2459.
8 lprior               -11.2    -11.3     -11.1    1.00   2400    2447.    2293.
9 lp__                 -25.0    -28.2     -23.7    1.00   2400    2069.    2035.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$ (its average since it is centered), the expected value of $y$ is 0.277 and we are 95% confident that the true value is between -0.437 and 0.938. So $y$ is expected to be 0.283 at the average $x$.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) -0.084 units and we are 95% confident that this change is between -0.997 and 0.803
sigma is estimated to be 1.03

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

dat1b.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 3 × 8
  variable              median  lower upper  rhat length ess_bulk ess_tail
  <chr>                  <dbl>  <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept           0.277  -0.437 0.938  1.00   2400    2270.    2282.
2 b_scalexscaleEQFALSE -0.0841 -0.997 0.803  1.00   2400    2371.    2243.
3 sigma                 1.03    0.596 1.76   1.00   2400    2362.    2402.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$ (its average since it is centered), the expected value of $y$ is 0.277 and we are 95% confident that the true value is between -0.437 and 0.938. So $y$ is expected to be 0.283 at the average $x$.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) -0.084 units and we are 95% confident that this change is between -0.997 and 0.803
sigma is estimated to be 1.03

The gather_draws() function performs the equivalent of an as_draws_df() followed by a pivot_longer() in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when $x$ is centered, so it is more convenient to refer to this parameter via a regular expression.

dat1b.brm2 |> get_variables()

 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "sigma"               
 [4] "Intercept"            "prior_Intercept"      "prior_b"             
 [7] "prior_sigma"          "lprior"               "lp__"                
[10] "accept_stat__"        "stepsize__"           "treedepth__"         
[13] "n_leapfrog__"         "divergent__"          "energy__"

dat1b.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x.*`, `sigma`, regex = TRUE) |> 
  ggplot() +
  geom_histogram(aes(x = .value)) +
  facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat1b.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x.*`, `sigma`, regex = TRUE) |> 
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat1b.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

           y         ymin      ymax .width .point .interval
1 0.06623776 2.645785e-07 0.3103406   0.95 median      hdci

Conclusions:

6.624% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 0% and 31.034%

dat1c.brm2 |> summary()

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ scale(x) 
   Data: dat (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.28      0.37    -0.43     1.02 1.00     2230     1989
scalex       -0.07      0.41    -0.87     0.71 1.00     2201     2292

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.14      0.37     0.65     2.01 1.00     2300     2224

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 0.284 and we are 95% confident that the true value is between -0.428 and 1.017. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) -0.069 units and we are 95% confident that this change is between -0.87 and 0.712
sigma is estimated to be 1.14

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat1c.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 9 × 8
  variable          median       lower   upper  rhat length ess_bulk ess_tail
  <chr>              <dbl>       <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept       0.279   -0.426       1.02  1.00    2400    2230.    1989.
2 b_scalex         -0.0750  -0.813       0.752 1.00    2400    2201.    2292.
3 sigma             1.06     0.550       1.80  0.999   2400    2300.    2224.
4 Intercept         0.279   -0.426       1.02  1.00    2400    2230.    1989.
5 prior_Intercept  31.8      3.60       62.5   1.00    2400    2343.    2023.
6 prior_b          -0.359  -44.6        46.7   1.00    2400    2334.    2368.
7 prior_sigma      11.6      0.0000123  47.0   1.00    2400    2421.    2290.
8 lprior          -12.5    -12.6       -12.4   1.00    2400    2226.    1977.
9 lp__            -26.5    -29.6       -25.0   1.00    2400    2390.    2328.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 0.279 and we are 95% confident that the true value is between -0.426 and 1.019. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average)-0.075 units and we are 95% confident that this change is between -0.813 and 0.752
sigma is estimated to be 1.06

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

dat1c.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 3 × 8
  variable     median  lower upper  rhat length ess_bulk ess_tail
  <chr>         <dbl>  <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept  0.279  -0.426 1.02   1.00   2400    2235.    1985.
2 b_scalex    -0.0750 -0.813 0.752  1.00   2400    2193.    2280.
3 sigma        1.06    0.550 1.80   1.00   2400    2290.    2197.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 0.279 and we are 95% confident that the true value is between -0.426 and 1.019. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) -0.075 units and we are 95% confident that this change is between -0.813 and 0.752
sigma is estimated to be 1.06

dat1c.brm2 |> get_variables()

 [1] "b_Intercept"     "b_scalex"        "sigma"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_sigma"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"

dat1c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, `sigma`, regex = TRUE) |> 
  ggplot() +
  geom_histogram(aes(x = .value)) +
  facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat1c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, `sigma`, regex = TRUE) |> 
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat1c.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

           y         ymin      ymax .width .point .interval
1 0.06728658 1.499005e-09 0.3197585   0.95 median      hdci

Conclusions:

6.729% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 0% and 31.976%

dat2a.brm2 |> summary()

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: dat2 (Number of observations: 12) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    20.89      2.03    16.62    24.89 1.00     2291     2382
xmedium      -0.83      2.85    -6.31     5.12 1.00     2347     2180
xhigh        -8.72      3.09   -14.74    -2.34 1.00     2321     2345

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     4.48      1.11     2.86     7.18 1.00     2605     2498

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x$ is “control”, the expected value of $y$ is 20.89 and we are 95% confident that the true value is between 16.616 and 24.891.
x*: (the slopes) - the change (effect) in $y$ between the first (control) group unit (=1) and each other $x$ level.
- xmedium: $y$ is (on average) 0.829 units (95% confident that this change is between -6.312 and 5.116 less in the “medium” group compared to the “control” group.
- xhigh: $y$ is (on average) 8.715 units and we are 95% confident that this change is between -14.736 and -2.344 less in the “high” group compared to the “control” group.
sigma is estimated to be 4.48

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat2a.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 10 × 8
   variable          median     lower  upper  rhat length ess_bulk ess_tail
   <chr>              <dbl>     <dbl>  <dbl> <dbl>  <dbl>    <dbl>    <dbl>
 1 b_Intercept      20.9     17.0      25.1   1.00   2400    2290.    2382.
 2 b_xmedium        -0.889   -6.44      4.89  1.00   2400    2347.    2180.
 3 b_xhigh          -8.73   -14.8      -2.38  1.00   2400    2320.    2345.
 4 sigma             4.30     2.67      6.71  1.00   2400    2605.    2498.
 5 Intercept        17.7     15.7      20.0   1.00   2400    2485.    2204.
 6 prior_Intercept  19.7     16.1      23.2   1.00   2400    2516.    2283.
 7 prior_b           0.0306 -18.9      20.7   1.00   2400    2384.    2381.
 8 prior_sigma       3.55     0.00499  14.0   1.00   2400    2374.    2369.
 9 lprior          -11.4    -13.1     -10.1   1.00   2400    2361.    2414.
10 lp__            -44.3    -47.8     -42.5   1.00   2400    2231.    2367.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x$ is “control”, the expected value of $y$ is 20.875 and we are 95% confident that the true value is between 16.993 and 25.064.
x*: (the slopes) - the change (effect) in $y$ between the first (control) group unit (=1) and each other $x$ level.
- xmedium: $y$ is (on average) 0.889 units (95% confident that this change is between -6.445 and 4.886 less in the “medium” group compared to the “control” group.
- xhigh: $y$ is (on average) 8.726 units and we are 95% confident that this change is between -14.756 and -2.383 less in the “high” group compared to the “control” group.
sigma is estimated to be 4.48

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

dat2a.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 4 × 8
  variable    median  lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl>  <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept 20.9    17.0  25.1   1.00   2400    2270.    2367.
2 b_xmedium   -0.889  -6.44  4.89  1.00   2400    2329.    2158.
3 b_xhigh     -8.73  -14.8  -2.38  1.00   2400    2294.    2322.
4 sigma        4.30    2.67  6.71  1.00   2400    2597.    2492.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x$ is “control”, the expected value of $y$ is 20.875 and we are 95% confident that the true value is between 16.993 and 25.064.
x*: (the slopes) - the change (effect) in $y$ between the first (control) group unit (=1) and each other $x$ level.
- xmedium: $y$ is (on average) 0.889 units (95% confident that this change is between -6.445 and 4.886 less in the “medium” group compared to the “control” group.
- xhigh: $y$ is (on average) 8.726 units and we are 95% confident that this change is between -14.756 and -2.383 less in the “high” group compared to the “control” group.
sigma is estimated to be 4.48

dat2a.brm2 |> get_variables()

 [1] "b_Intercept"     "b_xmedium"       "b_xhigh"         "sigma"          
 [5] "Intercept"       "prior_Intercept" "prior_b"         "prior_sigma"    
 [9] "lprior"          "lp__"            "accept_stat__"   "stepsize__"     
[13] "treedepth__"     "n_leapfrog__"    "divergent__"     "energy__"

dat2a.brm2 |>
    gather_draws(`b_Intercept`, `b_x.*`, `sigma`, regex = TRUE) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat2a.brm2 |>
  gather_draws(`b_x.*`, regex = TRUE) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat2a.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

          y      ymin      ymax .width .point .interval
1 0.5424268 0.1636153 0.7253359   0.95 median      hdci

Conclusions:

54.243% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 16.362% and 72.534%

dat2b.brm2 |> summary()

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ -1 + x 
   Data: dat2 (Number of observations: 12) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
xcontrol    20.24      2.21    15.64    24.39 1.00     2386     2370
xmedium     18.71      2.11    14.47    22.86 1.00     2115     2220
xhigh       11.14      2.24     6.58    15.71 1.00     2560     2029

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     4.48      1.14     2.85     7.18 1.00     2332     2290

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
x*: (the group means).
- xcontrol: the expected value of $y$ in the “control” group is 2.209 (95% credibility interval is between 15.636 and 24.394)
- xmedium: the expected value of $y$ in the “control” group is 2.11 (95% credibility interval is between 14.474 and 22.865)
- xhigh: the expected value of $y$ in the “control” group is 2.238 (95% credibility interval is between 6.578 and 15.707)
sigma is estimated to be 4.48

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat2b.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 8 × 8
  variable    median      lower  upper  rhat length ess_bulk ess_tail
  <chr>        <dbl>      <dbl>  <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_xcontrol   20.3   15.7       24.4   1.00   2400    2386.    2370.
2 b_xmedium    18.7   14.3       22.7   1.00   2400    2115.    2220.
3 b_xhigh      11.1    6.49      15.6   1.00   2400    2560.    2029.
4 sigma         4.27   2.67       6.74  1.00   2400    2332.    2290.
5 prior_b      16.2   -7.61      36.9   1.00   2400    2307.    2415.
6 prior_sigma   3.43   0.000387  14.3   1.00   2400    2552.    2454.
7 lprior      -11.8  -12.8      -11.1   1.00   2400    2321.    2236.
8 lp__        -44.6  -48.2      -42.7   1.00   2400    2459.    2181.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
x*: (the means of each group)
- xcontrol: the expected value of $y$ in the “control” group is 20.291 (95% credibility interval is between 15.722 and 24.424)
- xmedium: the expected value of $y$ in the “control” group is 18.701 (95% credibility interval is between 14.336 and 22.7)
- xhigh: the expected value of $y$ in the “control” group is 11.087 (95% credibility interval is between 6.491 and 15.598)
sigma is estimated to be 4.48

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

dat2b.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 4 × 8
  variable   median lower upper  rhat length ess_bulk ess_tail
  <chr>       <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_xcontrol  20.3  15.7  24.4   1.00   2400    2330.    2362.
2 b_xmedium   18.7  14.3  22.7   1.00   2400    2092.    2203.
3 b_xhigh     11.1   6.49 15.6   1.00   2400    2555.    2004.
4 sigma        4.27  2.67  6.74  1.00   2400    2326.    2284.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
x*: (the means of each group)
- xcontrol: the expected value of $y$ in the “control” group is 20.291 (95% credibility interval is between 15.722 and 24.424)
- xmedium: the expected value of $y$ in the “control” group is 18.701 (95% credibility interval is between 14.336 and 22.7)
- xhigh: the expected value of $y$ in the “control” group is 11.087 (95% credibility interval is between 6.491 and 15.598)
sigma is estimated to be 4.48

dat2b.brm2 |> get_variables()

 [1] "b_xcontrol"    "b_xmedium"     "b_xhigh"       "sigma"        
 [5] "prior_b"       "prior_sigma"   "lprior"        "lp__"         
 [9] "accept_stat__" "stepsize__"    "treedepth__"   "n_leapfrog__" 
[13] "divergent__"   "energy__"

dat2b.brm2 |>
    gather_draws(`b_x.*`, `sigma`, regex = TRUE) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat2b.brm2 |>
  gather_draws(`b_x.*`, regex = TRUE) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat2b.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

          y      ymin      ymax .width .point .interval
1 0.5554347 0.1883332 0.7255016   0.95 median      hdci

Conclusions:

55.543% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 18.833% and 72.55%

dat3a.brm2 |> summary()

 Family: poisson 
  Links: mu = log 
Formula: y ~ x 
   Data: dat3 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.02      0.35    -0.69     0.67 1.00     2108     1861
x             0.35      0.04     0.27     0.43 1.00     2142     2040

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important

The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$, the expected value of $y$ is 0.021 and we are 95% confident that the true value is between -0.693 and 0.672. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 0.345 units and we are 95% confident that this change is between 0.266 and 0.432

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat3a.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 7 × 8
  variable          median   lower   upper  rhat length ess_bulk ess_tail
  <chr>              <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept       0.0368  -0.693   0.672  1.00   2400    2108.    1861.
2 b_x               0.345    0.265   0.430  1.00   2400    2141.    2040.
3 Intercept         1.93     1.63    2.18   1.00   2400    2160.    2129.
4 prior_Intercept   2.01    -0.701   4.48   1.00   2400    1982.    1809.
5 prior_b           0.0274  -6.08    6.64   1.00   2400    2399.    2209.
6 lprior           -2.92    -2.96   -2.91   1.00   2400    2116.    2069.
7 lp__            -26.8    -29.1   -26.1    1.00   2400    2158.    2326.

Important

The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 1.929 and we are 95% confident that the true value is between -0.693 and 0.672. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 0.345 units and we are 95% confident that this change is between 0.265 and 0.43

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat3a.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 2 × 8
  variable    median lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept   1.04 0.437  1.84  1.00   2400    2103.    1798.
2 b_x           1.41 1.30   1.53  1.00   2400    2136.    1970.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 1.038 and we are 95% confident that the true value is between 0.5 and 1.958. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of 1.411 and we are 95% confident that this change is between 1.303 and 1.538. This represents a (value -1) * 100 41.1 % increase in $y$ per unit increase in $x$.

dat3a.brm2 |>
    gather_draws(b_Intercept, b_x) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat3a.brm2 |>
  gather_draws(b_Intercept, b_x) |>
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat3a.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

         y      ymin      ymax .width .point .interval
1 0.922404 0.8093995 0.9343621   0.95 median      hdci

Conclusions:

92.24% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 80.94% and 93.436%

dat3b.brm2 |> summary()

 Family: poisson 
  Links: mu = log 
Formula: y ~ scale(x, scale = FALSE) 
   Data: dat3 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              1.93      0.14     1.65     2.19 1.00     2222     2142
scalexscaleEQFALSE     0.34      0.04     0.26     0.43 1.00     2132     1947

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important

The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$, the expected value of $y$ is 1.926 and we are 95% confident that the true value is between 1.649 and 2.191. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 0.342 units and we are 95% confident that this change is between 0.257 and 0.433

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat3b.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 7 × 8
  variable               median   lower   upper  rhat length ess_bulk ess_tail
  <chr>                   <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            1.93     1.65    2.19   1.00   2400    2222.    2142.
2 b_scalexscaleEQFALSE   0.342    0.263   0.438  1.00   2400    2132.    1947.
3 Intercept              1.93     1.65    2.19   1.00   2400    2222.    2142.
4 prior_Intercept        2.00    -0.452   4.56   1.00   2400    2307.    2368.
5 prior_b               -0.0445  -6.25    5.77   1.00   2400    2196.    2306.
6 lprior                -2.92    -2.95   -2.91   1.00   2400    2217.    2187.
7 lp__                 -26.8    -29.2   -26.1    1.00   2400    2241.    2284.

Important

The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 1.928 and we are 95% confident that the true value is between 1.646 and 2.187. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 0.342 units and we are 95% confident that this change is between 0.263 and 0.438

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat3b.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 2 × 8
  variable             median lower upper  rhat length ess_bulk ess_tail
  <chr>                 <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            6.87  5.11  8.83  1.00   2400    2211.    2121.
2 b_scalexscaleEQFALSE   1.41  1.29  1.54  1.00   2400    2123.    1929.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 6.873 and we are 95% confident that the true value is between 5.188 and 8.912. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of 1.408 and we are 95% confident that this change is between 1.301 and 1.549. This represents a (value -1) * 100 40.8 % increase in $y$ per unit increase in $x$.

dat3b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat3b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat3b.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

          y      ymin      ymax .width .point .interval
1 0.9215583 0.8041622 0.9343709   0.95 median      hdci

Conclusions:

92.156% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 80.416% and 93.437%

dat3c.brm2 |> summary()

 Family: poisson 
  Links: mu = log 
Formula: y ~ scale(x) 
   Data: dat3 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     1.93      0.14     1.64     2.20 1.00     2142     2163
scalex        1.03      0.13     0.78     1.30 1.00     2155     2327

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important

The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 1.929 and we are 95% confident that the true value is between 1.643 and 2.195. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) 1.035 units and we are 95% confident that this change is between 0.782 and 1.305

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat3c.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 7 × 8
  variable           median   lower  upper  rhat length ess_bulk ess_tail
  <chr>               <dbl>   <dbl>  <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept       1.93      1.66    2.22  1.00   2400    2141.    2163.
2 b_scalex          1.03      0.768   1.28  1.00   2400    2155.    2327.
3 Intercept         1.93      1.66    2.22  1.00   2400    2141.    2163.
4 prior_Intercept   2.00     -0.830   4.47  1.00   2400    2402.    2329.
5 prior_b           0.00164  -4.98    3.99  1.00   2400    2195.    2105.
6 lprior           -2.86     -3.04   -2.70  1.00   2400    2139.    2328.
7 lp__            -26.8     -28.9   -26.0   1.00   2400    2190.    2273.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 1.932 and we are 95% confident that the true value is between 1.665 and 2.218. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average)1.034 units and we are 95% confident that this change is between 0.768 and 1.279

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat3c.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 2 × 8
  variable    median lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept   6.90  5.05  8.85  1.00   2400    2126.    2156.
2 b_scalex      2.81  2.13  3.55  1.00   2400    2121.    2322.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 6.903 and we are 95% confident that the true value is between 5.285 and 9.185. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) a factor of 2.812 units and we are 95% confident that this change is between 2.156 and 3.593. This represents a ((value -1) * 100) 181.2% increase in $y$ per unit increase in $x$.

dat3c.brm2 |> get_variables()

 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat3c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  geom_histogram(aes(x = .value)) +
  facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat3c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat3c.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

          y      ymin      ymax .width .point .interval
1 0.9211877 0.8045365 0.9343754   0.95 median      hdci

Conclusions:

92.119% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 80.454% and 93.438%

dat4a.brm2 |> summary()

 Family: negbinomial 
  Links: mu = log; shape = identity 
Formula: y ~ x 
   Data: dat4 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.37      0.39    -0.43     1.09 1.00     2422     2324
x             0.28      0.05     0.18     0.40 1.00     2364     2324

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape    50.95     59.21     3.10   222.10 1.00     1971     2265

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important

The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$, the expected value of $y$ is 0.37 and we are 95% confident that the true value is between -0.431 and 1.094. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 0.282 units and we are 95% confident that this change is between 0.18 and 0.396

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat4a.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 9 × 8
  variable           median      lower   upper  rhat length ess_bulk ess_tail
  <chr>               <dbl>      <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept      3.80e- 1 -4.04e-  1   1.11   1.00   2400    2422.    2324.
2 b_x              2.81e- 1  1.75e-  1   0.390  1.00   2400    2364.    2324.
3 shape            2.99e+ 1  1.04e+  0 174.     1.00   2400    1971.    2265.
4 Intercept        1.92e+ 0  1.62e+  0   2.23   1.00   2400    2404.    2057.
5 prior_Intercept  1.94e+ 0 -4.19e-  2   4.00   1.00   2400    2064.    2148.
6 prior_b          2.08e- 2 -5.10e+  0   4.33   1.00   2400    2121.    2308.
7 prior_shape      1.04e-27  1.98e-285   0.554  1.00   2400    2355.    2328.
8 lprior          -1.07e+ 1 -1.41e+  1  -7.98   1.00   2400    1969.    2305.
9 lp__            -3.18e+ 1 -3.47e+  1 -30.6    1.00   2400    2290.    1737.

Important

The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 29.87 and we are 95% confident that the true value is between -0.404 and 1.11. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 0.281 units and we are 95% confident that this change is between 0.175 and 0.39

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat4a.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 2 × 8
  variable    median lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept   1.46 0.547  2.83  1.00   2400    2419.    2318.
2 b_x           1.32 1.19   1.48  1.00   2400    2358.    2317.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 1.463 and we are 95% confident that the true value is between 0.668 and 3.035. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of 1.325 and we are 95% confident that this change is between 1.191 and 1.477. This represents a (value -1) * 100 32.5 % increase in $y$ per unit increase in $x$.

dat4a.brm2 |>
    gather_draws(b_Intercept, b_x) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat4a.brm2 |>
  gather_draws(b_Intercept, b_x) |>
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat4a.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

          y      ymin      ymax .width .point .interval
1 0.8993859 0.6618696 0.9214552   0.95 median      hdci

Conclusions:

89.939% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 66.187% and 92.146%

dat4b.brm2 |> summary()

 Family: negbinomial 
  Links: mu = log; shape = identity 
Formula: y ~ scale(x, scale = FALSE) 
   Data: dat4 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              1.92      0.15     1.61     2.21 1.00     2358     2422
scalexscaleEQFALSE     0.28      0.05     0.18     0.40 1.00     2376     2327

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape    49.96     59.36     3.36   203.02 1.00     2163     1994

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important

The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$, the expected value of $y$ is 1.918 and we are 95% confident that the true value is between 1.608 and 2.214. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 0.285 units and we are 95% confident that this change is between 0.18 and 0.396

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat4b.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 9 × 8
  variable                median   lower   upper  rhat length ess_bulk ess_tail
  <chr>                    <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept           1.92e+ 0   1.62    2.22  1.00    2400    2358.    2422.
2 b_scalexscaleEQFALSE  2.84e- 1   0.176   0.391 1.00    2400    2376.    2327.
3 shape                 2.93e+ 1   0.992 164.    0.999   2400    2163.    1994.
4 Intercept             1.92e+ 0   1.62    2.22  1.00    2400    2358.    2422.
5 prior_Intercept       2.06e+ 0   0.190   3.91  1.00    2400    2310.    2313.
6 prior_b               1.04e- 2  -4.52    4.81  1.00    2400    2340.    2167.
7 prior_shape           1.07e-28   0       0.325 1.00    2400    1967.    2130.
8 lprior               -1.05e+ 1 -13.9    -7.90  0.999   2400    2151.    1995.
9 lp__                 -3.17e+ 1 -34.6   -30.5   1.00    2400    2377.    2328.

Important

The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 29.253 and we are 95% confident that the true value is between 1.617 and 2.221. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 0.284 units and we are 95% confident that this change is between 0.176 and 0.391

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat4b.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 2 × 8
  variable             median lower upper  rhat length ess_bulk ess_tail
  <chr>                 <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            6.84  4.87  8.96  1.00   2400    2338.    2403.
2 b_scalexscaleEQFALSE   1.33  1.19  1.48  1.00   2400    2366.    2320.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 6.839 and we are 95% confident that the true value is between 5.04 and 9.217. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of 1.328 and we are 95% confident that this change is between 1.193 and 1.479. This represents a (value -1) * 100 32.8 % increase in $y$ per unit increase in $x$.

dat4b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat4b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat4b.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

          y     ymin      ymax .width .point .interval
1 0.8982169 0.670107 0.9214525   0.95 median      hdci

Conclusions:

89.822% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 67.011% and 92.145%

dat4c.brm2 |> summary()

 Family: negbinomial 
  Links: mu = log; shape = identity 
Formula: y ~ scale(x) 
   Data: dat4 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     1.93      0.15     1.63     2.25 1.00     2049     2155
scalex        0.84      0.16     0.54     1.17 1.00     2484     2450

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape    50.10     58.70     3.27   226.11 1.00     2338     2189

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important

The results presented by the summary() function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 1.932 and we are 95% confident that the true value is between 1.627 and 2.247. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) 0.842 units and we are 95% confident that this change is between 0.538 and 1.17

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat4c.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 9 × 8
  variable           median   lower   upper  rhat length ess_bulk ess_tail
  <chr>               <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept      1.93e+ 0   1.63    2.25  1.00    2400    2049.    2155.
2 b_scalex         8.35e- 1   0.547   1.17  0.999   2400    2484.    2450.
3 shape            2.94e+ 1   0.956 171.    1.00    2400    2337.    2189.
4 Intercept        1.93e+ 0   1.63    2.25  1.00    2400    2049.    2155.
5 prior_Intercept  2.06e+ 0   0.323   3.85  1.00    2400    2397.    2180.
6 prior_b         -8.56e- 2  -5.11    4.19  1.00    2400    2546.    2428.
7 prior_shape      1.36e-27   0       0.309 1.00    2400    2271.    2323.
8 lprior          -1.07e+ 1 -14.2    -8.02  1.00    2400    2337.    2305.
9 lp__            -3.19e+ 1 -34.8   -30.7   1.00    2400    2278.    2326.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 1.929 and we are 95% confident that the true value is between 1.633 and 2.25. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average)0.835 units and we are 95% confident that this change is between 0.547 and 1.173

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat4c.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 2 × 8
  variable    median lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept   6.88  4.78  8.99  1.00   2400    2051.    2145.
2 b_scalex      2.31  1.68  3.16  1.00   2400    2481.    2440.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 6.882 and we are 95% confident that the true value is between 5.118 and 9.485. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) a factor of 2.306 units and we are 95% confident that this change is between 1.728 and 3.232. This represents a ((value -1) * 100) 130.6% increase in $y$ per unit increase in $x$.

dat4c.brm2 |> get_variables()

 [1] "b_Intercept"     "b_scalex"        "shape"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_shape"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"

dat4c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  geom_histogram(aes(x = .value)) +
  facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat4c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat4c.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

          y      ymin      ymax .width .point .interval
1 0.8968911 0.6390544 0.9214458   0.95 median      hdci

Conclusions:

89.689% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 63.905% and 92.145%

dat5a.brm2 |> summary()

 Family: binomial 
  Links: mu = logit 
Formula: y | trials(1) ~ x 
   Data: dat5 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -5.61      3.12   -13.07    -0.90 1.00     2347     2216
x             1.12      0.57     0.26     2.46 1.00     2324     2251

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important

The results presented by the summary() function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$, the expected value of $y$ is -5.609 and we are 95% confident that the true value is between -13.073 and -0.897. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 1.119 units and we are 95% confident that this change is between 0.258 and 2.46

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat5a.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 7 × 8
  variable          median   lower  upper  rhat length ess_bulk ess_tail
  <chr>              <dbl>   <dbl>  <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept     -5.15    -11.6   -0.263  1.00   2400    2348.    2216.
2 b_x              1.03      0.151  2.27   1.00   2400    2324.    2251.
3 Intercept        0.519    -0.805  1.97   1.00   2400    2397.    2410.
4 prior_Intercept -0.0280   -1.93   1.96   1.00   2400    2344.    2408.
5 prior_b          0.00909  -3.36   2.69   1.00   2400    2345.    2351.
6 lprior          -2.85     -4.87  -1.93   1.00   2400    2335.    2413.
7 lp__            -6.06     -8.56  -5.26   1.00   2400    2187.    2329.

Important

The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 0.519 and we are 95% confident that the true value is between -11.638 and -0.263. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 1.028 units and we are 95% confident that this change is between 0.151 and 2.274

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat5a.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 2 × 8
  variable     median    lower upper  rhat length ess_bulk ess_tail
  <chr>         <dbl>    <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept 0.00581 1.71e-10 0.254  1.00   2400    2341.    2208.
2 b_x         2.80    9.28e- 1 8.61   1.00   2400    2316.    2242.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 0.006 and we are 95% confident that the true value is between 0 and 0.768. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of 2.796 and we are 95% confident that this change is between 1.163 and 9.722. This represents a (value -1) * 100 179.6 % increase in $y$ per unit increase in $x$.

dat5a.brm2 |>
    gather_draws(b_Intercept, b_x) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat5a.brm2 |>
  gather_draws(b_Intercept, b_x) |>
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat5a.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

          y      ymin      ymax .width .point .interval
1 0.6244975 0.2432658 0.6997611   0.95 median      hdci

Conclusions:

62.45% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 24.327% and 69.976%

dat5b.brm2 |> summary()

 Family: binomial 
  Links: mu = logit 
Formula: y | trials(1) ~ scale(x, scale = FALSE) 
   Data: dat5 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              0.50      0.73    -0.91     1.96 1.00     2445     2142
scalexscaleEQFALSE     1.11      0.58     0.24     2.51 1.00     2196     2059

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$, the expected value of $y$ is 0.502 and we are 95% confident that the true value is between -0.914 and 1.96. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 1.11 units and we are 95% confident that this change is between 0.236 and 2.51

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat5b.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 7 × 8
  variable              median  lower upper  rhat length ess_bulk ess_tail
  <chr>                  <dbl>  <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept           0.482  -0.868  1.99  1.00   2400    2444.    2142.
2 b_scalexscaleEQFALSE  1.03    0.162  2.26  1.00   2400    2195.    2059.
3 Intercept             0.482  -0.868  1.99  1.00   2400    2444.    2142.
4 prior_Intercept       0.0390 -1.96   1.91  1.00   2400    2212.    2233.
5 prior_b              -0.0516 -3.45   2.82  1.00   2400    2297.    2411.
6 lprior               -2.85   -4.78  -1.92  1.00   2400    2226.    2133.
7 lp__                 -6.03   -8.56  -5.26  1.00   2400    2283.    2135.

Important

The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 0.482 and we are 95% confident that the true value is between -0.868 and 1.986. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 1.027 units and we are 95% confident that this change is between 0.162 and 2.256

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat5b.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 2 × 8
  variable             median lower upper  rhat length ess_bulk ess_tail
  <chr>                 <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            1.62 0.242  5.79  1.00   2400    2440.    2098.
2 b_scalexscaleEQFALSE   2.79 0.974  8.67  1.00   2400    2181.    2044.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 1.619 and we are 95% confident that the true value is between 0.42 and 7.29. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of 2.793 and we are 95% confident that this change is between 1.176 and 9.546. This represents a (value -1) * 100 179.3 % increase in $y$ per unit increase in $x$.

dat5b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat5b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat5b.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

          y      ymin     ymax .width .point .interval
1 0.6234333 0.2254062 0.698039   0.95 median      hdci

Conclusions:

62.343% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 22.541% and 69.804%

dat5c.brm2 |> summary()

 Family: binomial 
  Links: mu = logit 
Formula: y | trials(1) ~ scale(x) 
   Data: dat5 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.42      0.68    -0.83     1.82 1.00     2297     2232
scalex        2.08      1.28     0.22     5.17 1.00     2331     2289

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 0.424 and we are 95% confident that the true value is between -0.831 and 1.817. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) 2.079 units and we are 95% confident that this change is between 0.215 and 5.17

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat5c.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 7 × 8
  variable         median    lower upper  rhat length ess_bulk ess_tail
  <chr>             <dbl>    <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept      0.409   -0.899   1.73  1.00   2400    2297.    2232.
2 b_scalex         1.84     0.0104  4.69  1.00   2400    2331.    2289.
3 Intercept        0.409   -0.899   1.73  1.00   2400    2297.    2232.
4 prior_Intercept -0.0150  -2.01    1.96  1.00   2400    2303.    2410.
5 prior_b         -0.0234  -3.46    2.82  1.00   2400    2384.    2367.
6 lprior          -3.73    -6.57   -1.92  1.00   2400    2362.    2214.
7 lp__            -7.42   -10.1    -6.56  1.00   2400    2418.    2289.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 0.409 and we are 95% confident that the true value is between -0.899 and 1.734. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average)1.844 units and we are 95% confident that this change is between 0.01 and 4.691

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat5c.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 2 × 8
  variable    median lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept   1.51 0.176  4.75  1.00   2400    2297.    2198.
2 b_scalex      6.32 0.767 94.0   1.00   2400    2312.    2283.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 1.506 and we are 95% confident that the true value is between 0.407 and 5.663. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) a factor of 6.32 units and we are 95% confident that this change is between 1.01 and 108.987. This represents a ((value -1) * 100) 532% increase in $y$ per unit increase in $x$.

dat5c.brm2 |> get_variables()

 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat5c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  geom_histogram(aes(x = .value)) +
  facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat5c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat5c.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

          y       ymin      ymax .width .point .interval
1 0.4896796 0.04655454 0.6721203   0.95 median      hdci

Conclusions:

48.968% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 4.655% and 67.212%

dat6a.brm2 |> summary()

 Family: binomial 
  Links: mu = logit 
Formula: count | trials(total) ~ x 
   Data: dat6 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -3.26      0.92    -5.19    -1.56 1.00     2137     2412
x             0.65      0.17     0.35     0.99 1.00     2142     2440

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$, the expected value of $y$ is -3.259 and we are 95% confident that the true value is between -5.186 and -1.556. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 0.649 units and we are 95% confident that this change is between 0.35 and 0.992

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat6a.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 7 × 8
  variable           median   lower   upper  rhat length ess_bulk ess_tail
  <chr>               <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept      -3.22     -5.04   -1.49   1.00   2400    2138.    2412.
2 b_x               0.640     0.340   0.976  1.00   2400    2142.    2440.
3 Intercept         0.307    -0.182   0.762  1.00   2400    2451.    2291.
4 prior_Intercept  -0.233    -0.819   0.456  1.00   2400    2520.    2191.
5 prior_b           0.00814  -1.77    1.63   1.00   2400    2249.    2222.
6 lprior           -2.32     -5.22   -0.449  1.00   2400    2477.    2411.
7 lp__            -14.4     -16.7   -13.7    1.00   2400    2466.    2131.

Important

The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 0.307 and we are 95% confident that the true value is between -5.041 and -1.491. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 0.64 units and we are 95% confident that this change is between 0.34 and 0.976

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat6a.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 2 × 8
  variable    median   lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl>   <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept 0.0401 0.00195 0.164  1.00   2400    2073.    2404.
2 b_x         1.90   1.35    2.58   1.00   2400    2092.    2428.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 0.04 and we are 95% confident that the true value is between 0.006 and 0.225. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of 1.897 and we are 95% confident that this change is between 1.404 and 2.655. This represents a (value -1) * 100 89.7 % increase in $y$ per unit increase in $x$.

dat6a.brm2 |>
    gather_draws(b_Intercept, b_x) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat6a.brm2 |>
  gather_draws(b_Intercept, b_x) |>
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat6a.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

          y      ymin      ymax .width .point .interval
1 0.7604139 0.5576819 0.8234808   0.95 median      hdci

Conclusions:

76.041% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 55.768% and 82.348%

dat6b.brm2 |> summary()

 Family: binomial 
  Links: mu = logit 
Formula: count | trials(total) ~ scale(x, scale = FALSE) 
   Data: dat6 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              0.30      0.25    -0.19     0.80 1.00     2318     2412
scalexscaleEQFALSE     0.66      0.17     0.36     1.01 1.00     2271     2327

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$, the expected value of $y$ is 0.303 and we are 95% confident that the true value is between -0.191 and 0.801. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 0.656 units and we are 95% confident that this change is between 0.355 and 1.012

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat6b.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 7 × 8
  variable                 median   lower   upper  rhat length ess_bulk ess_tail
  <chr>                     <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            0.303     -0.184   0.808  1.00   2400    2317.    2412.
2 b_scalexscaleEQFALSE   0.647      0.354   1.01   1.00   2400    2270.    2327.
3 Intercept              0.303     -0.184   0.808  1.00   2400    2317.    2412.
4 prior_Intercept       -0.213     -0.816   0.456  1.00   2400    2345.    2343.
5 prior_b                0.000656  -1.57    1.70   1.00   2400    2488.    2419.
6 lprior                -2.31      -5.18   -0.404  1.00   2400    2450.    2455.
7 lp__                 -14.5      -16.8   -13.7    1.00   2400    2395.    2328.

Important

The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 0.303 and we are 95% confident that the true value is between -0.184 and 0.808. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) 0.647 units and we are 95% confident that this change is between 0.354 and 1.006

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat6b.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 2 × 8
  variable             median lower upper  rhat length ess_bulk ess_tail
  <chr>                 <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept            1.35 0.726  2.08  1.00   2400    2268.    2404.
2 b_scalexscaleEQFALSE   1.91 1.34   2.60  1.00   2400    2262.    2322.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$, the expected value of $y$ is 1.354 and we are 95% confident that the true value is between 0.832 and 2.242. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of 1.909 and we are 95% confident that this change is between 1.424 and 2.735. This represents a (value -1) * 100 90.9 % increase in $y$ per unit increase in $x$.

dat6b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
    ggplot() +
    geom_histogram(aes(x = .value)) +
    facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat6b.brm2 |>
    gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |>
    mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat6b.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

         y      ymin      ymax .width .point .interval
1 0.761592 0.5690617 0.8242042   0.95 median      hdci

Conclusions:

76.159% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 56.906% and 82.42%

dat6c.brm2 |> summary()

 Family: binomial 
  Links: mu = logit 
Formula: count | trials(total) ~ scale(x) 
   Data: dat6 (Number of observations: 10) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.28      0.25    -0.20     0.76 1.00     2355     2098
scalex        1.62      0.51     0.70     2.74 1.00     2312     2257

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 0.278 and we are 95% confident that the true value is between -0.201 and 0.763. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) 1.621 units and we are 95% confident that this change is between 0.7 and 2.742

Note, the estimates are means and quantiles.

In the following, I am nominating that I want to summarise each parameter posterior by:

the median
the 95% highest probability density interval (credibility interval)
Rhat
total number of draws
bulk and tail effective sample sizes

dat6c.brm2 |>
  posterior::summarise_draws(
    median,
    HDInterval::hdi,
    rhat,
    length,
    ess_bulk, ess_tail
)

# A tibble: 7 × 8
  variable          median   lower   upper  rhat length ess_bulk ess_tail
  <chr>              <dbl>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept       0.279   -0.195   0.765 1.00    2400    2355.    2098.
2 b_scalex          1.61     0.622   2.60  0.999   2400    2313.    2257.
3 Intercept         0.279   -0.195   0.765 1.00    2400    2355.    2098.
4 prior_Intercept  -0.225   -0.858   0.410 1.00    2400    2293.    2368.
5 prior_b          -0.0146  -0.925   0.957 1.00    2400    2225.    2252.
6 lprior           -5.25    -8.99   -2.08  1.00    2400    2256.    2222.
7 lp__            -17.9    -20.4   -17.1   0.999   2400    2151.    2039.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 0.279 and we are 95% confident that the true value is between -0.195 and 0.765. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average)1.605 units and we are 95% confident that this change is between 0.622 and 2.596

In the following, I will use select() with a regex (regular expression) to match only the columns that:

start with (^) “b_” followed by any amount of (*) any character (.)
start with (^) “sigma”

It will also use mutate() to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising.

dat6c.brm2 |>
    brms::as_draws_df() |>
    dplyr::select(matches("^b_.*|^sigma")) |>
    mutate(across(everything(), exp)) |>
    posterior::summarise_draws(
        median,
        HDInterval::hdi,
        rhat,
        length,
        ess_bulk, ess_tail
    )

Warning: Dropping 'draws_df' class as required metadata was removed.

# A tibble: 2 × 8
  variable    median lower upper  rhat length ess_bulk ess_tail
  <chr>        <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1 b_Intercept   1.32 0.762  2.07  1.00   2400    2362.    2039.
2 b_scalex      4.98 1.63  12.7   1.00   2400    2270.    2223.

Conclusions:

in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is 2400.
the Rhat values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains
Bulk_ESS: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable
Tail_ESS: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data.
b_Intercept: when $x=0$ (its average since it is standardised), the expected value of $y$ is 1.321 and we are 95% confident that the true value is between 0.823 and 2.15. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value.
b_x: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) a factor of 4.979 units and we are 95% confident that this change is between 1.862 and 13.409. This represents a ((value -1) * 100) 397.9% increase in $y$ per unit increase in $x$.

dat6c.brm2 |> get_variables()

 [1] "b_Intercept"     "b_scalex"        "Intercept"       "prior_Intercept"
 [5] "prior_b"         "lprior"          "lp__"            "accept_stat__"  
 [9] "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[13] "energy__"

dat6c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  geom_histogram(aes(x = .value)) +
  facet_wrap(~.variable, scales = "free")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Alternatively, there are various other representations supported by the ggdist package.

dat6c.brm2 |>
  gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> 
  mutate(across(everything(), exp)) |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable))

dat6c.brm2 |>
    bayes_R2(summary = FALSE) |>
    median_hdci()

          y      ymin      ymax .width .point .interval
1 0.7096935 0.3401609 0.8172679   0.95 median      hdci

Conclusions:

70.969% of the total variability in $y$ can be explained by its relationship to $x$
we are 95% confident that the strength of this relationship is between 34.016% and 81.727%

12 Predictions

Whilst linear models are useful for estimating effects (relative differences), because they are low dimensional (only focus on a small number of covariates) they are not good at absolute predictions. Nevertheless, predicting values from linear models provides the basis for investigating/estimating additional effects and generating various graphics to visualise the estimates.

There are a large number of candidate routines for performing prediction. We will go through some of these. It is worth noting that in this context prediction is technically the act of estimating what we expect to get if we were to collect a single new observation from a particular population (e.g. a specific level of fertilizer concentration). Often this is not what we want. Often we want the fitted values - estimates of what we expect to get if we were to collect multiple new observations and average them.

So while fitted values represent the expected underlying processes occurring in the system, predicted values represent our expectations from sampling from such processes.

Package	Function	Description	Summarise with
`emmeans`	`emmeans`	Estimated marginal means from which posteriors can be drawn (via `tidy_draws` or `gather_emmeans_draws()`)	`median_hdci()`
`rstantools`	`posterior_predict`	Draw from the posterior of a prediction (includes sigma) - predicts single observations	`summarise_draws()`
`rstantools`	`posterior_linpred`	Draw from the posterior of the fitted values (on the link scale) - predicts average observations	`summarise_draws()`
`rstantools`	`posterior_epred`	Draw from the posterior of the fitted values (on the response scale) - predicts average observations	`summarise_draws()`
`tidybayes`	`predicted_draws`	Extract the posterior of prediction values	`median_hdci()`
`tidybayes`	`epred_draws`	Extract the posterior of expected values	`median_hdci()`
~~`tidybayes`~~	~~`fitted_draws`~~	~~Extract the posterior of fitted values~~	`median_hdci()`
`tidybayes`	`add_predicted_draws`	Adds draws from the posterior of predictions to a data frame (of prediction data)	`median_hdci()`
`tidybayes`	`add_fitted_draws`	Adds draws from the posterior of fitted values to a data frame (of prediction data)	`median_hdci()`

For simple models prediction is essentially taking the model formula complete with parameter (coefficient) estimates and solving for new values of the predictor. To explore this, we will use the fitted model to predict Yield for a Fertilizer concentration of 110.

We will therefore start by establishing this prediction domain as a data frame to use across all of the prediction routines.

In each case, we will predict $y$ when $x$ is 2.5 and 5

Note, for a Gaussian model emmeans, posterior_epred and posterior_linpred will all yield the same outputs. posterior_predict will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples.

dat1a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))

   x  emmean lower.HPD upper.HPD
 2.5  0.0595     -2.59      2.55
 5.0 -0.1437     -4.81      4.67

Point estimate displayed: median 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat1a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5  0.0595  -2.59   2.55   0.95 median hdci     
2   5   -0.144   -4.81   4.67   0.95 median hdci

dat1a.emmeans <- dat1a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0.06
the predicted (estimated) mean $y$ associated with an $x$ of 5 is -0.144
95% HPD intervals also given

dat1a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0730 -3.54  3.43
2 ...2     -0.120  -5.33  5.07

# Or for even more control and ability to add other summaries
dat1a.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction  0.102 -3.46  3.54
2   5       2 .prediction -0.159 -5.14  5.49

dat1a.pred <- dat1a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0.064
the predicted (estimated) mean $y$ associated with an $x$ of 5 is -0.24
95% HPD intervals also given

dat1a.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0595 -2.59  2.55
2 ...2     -0.144  -4.81  4.67

# Or for even more control and ability to add other summaries
dat1a.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable  median lower upper
  <dbl> <int> <chr>      <dbl> <dbl> <dbl>
1   2.5     1 .epred    0.0595 -2.59  2.55
2   5       2 .epred   -0.144  -4.81  4.67

dat1a.epred <- dat1a.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.06
the fitted mean $y$ associated with an $x$ of 5 is -0.144
95% HPD intervals also given

dat1a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0595 -2.59  2.55
2 ...2     -0.144  -4.81  4.67

# Or for even more control and ability to add other summaries
dat1a.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable  median lower upper
  <dbl> <int> <chr>      <dbl> <dbl> <dbl>
1   2.5     1 .linpred  0.0595 -2.59  2.55
2   5       2 .linpred -0.144  -4.81  4.67

dat1a.linpred <- dat1a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.06
the fitted mean $y$ associated with an $x$ of 5 is -0.144
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat1b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))

   x  emmean lower.HPD upper.HPD
 2.5  0.0433     -2.59      2.51
 5.0 -0.1834     -4.86      4.57

Point estimate displayed: median 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat1b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5  0.0433  -2.59   2.51   0.95 median hdci     
2   5   -0.183   -4.86   4.57   0.95 median hdci

dat1b.emmeans <- dat1b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0.043
the predicted (estimated) mean $y$ associated with an $x$ of 5 is -0.183
95% HPD intervals also given

dat1b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable    median lower upper
  <chr>        <dbl> <dbl> <dbl>
1 ...1      0.000225 -3.36  3.65
2 ...2     -0.172    -5.62  4.99

# Or for even more control and ability to add other summaries
dat1b.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable      median lower upper
  <dbl> <int> <chr>          <dbl> <dbl> <dbl>
1   2.5     1 .prediction -0.00878 -3.36  3.47
2   5       2 .prediction -0.168   -5.54  4.79

dat1b.pred <- dat1b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0.104
the predicted (estimated) mean $y$ associated with an $x$ of 5 is -0.251
95% HPD intervals also given

dat1b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0433 -2.59  2.51
2 ...2     -0.183  -4.86  4.57

# Or for even more control and ability to add other summaries
dat1b.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable  median lower upper
  <dbl> <int> <chr>      <dbl> <dbl> <dbl>
1   2.5     1 .epred    0.0433 -2.59  2.51
2   5       2 .epred   -0.183  -4.86  4.57

dat1b.epred <- dat1b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.043
the fitted mean $y$ associated with an $x$ of 5 is -0.183
95% HPD intervals also given

dat1b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    median_hdci()

            y      ymin     ymax .width .point .interval
1 -0.05408044 -4.058454 3.885879   0.95 median      hdci

# Or for even more control and ability to add other summaries
dat1b.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable  median lower upper
  <dbl> <int> <chr>      <dbl> <dbl> <dbl>
1   2.5     1 .linpred  0.0433 -2.59  2.51
2   5       2 .linpred -0.183  -4.86  4.57

dat1b.linpred <- dat1b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    median_hdci()

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is -4.058
the fitted mean $y$ associated with an $x$ of 5 is NA
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat1c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))

   x emmean lower.HPD upper.HPD
 2.5  0.037     -2.48      2.74
 5.0 -0.181     -5.10      4.80

Point estimate displayed: median 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat1c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5  0.0370  -2.48   2.74   0.95 median hdci     
2   5   -0.181   -5.06   4.85   0.95 median hdci

dat1c.emmeans <- dat1c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0.037
the predicted (estimated) mean $y$ associated with an $x$ of 5 is -0.181
95% HPD intervals also given

dat1c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0783 -3.35  3.70
2 ...2     -0.146  -5.66  5.24

# Or for even more control and ability to add other summaries
dat1c.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable     median lower upper
  <dbl> <int> <chr>         <dbl> <dbl> <dbl>
1   2.5     1 .prediction  0.0225 -3.61  3.44
2   5       2 .prediction -0.127  -5.98  5.35

dat1c.pred <- dat1c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0.064
the predicted (estimated) mean $y$ associated with an $x$ of 5 is -0.274
95% HPD intervals also given

dat1c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable  median lower upper
  <chr>      <dbl> <dbl> <dbl>
1 ...1      0.0370 -2.48  2.74
2 ...2     -0.181  -5.10  4.80

# Or for even more control and ability to add other summaries
dat1c.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable  median lower upper
  <dbl> <int> <chr>      <dbl> <dbl> <dbl>
1   2.5     1 .epred    0.0370 -2.48  2.74
2   5       2 .epred   -0.181  -5.10  4.80

dat1c.epred <- dat1c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.037
the fitted mean $y$ associated with an $x$ of 5 is -0.181
95% HPD intervals also given

dat1c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    median_hdci()

            y     ymin     ymax .width .point .interval
1 -0.04678397 -4.02588 4.194853   0.95 median      hdci

# Or for even more control and ability to add other summaries
dat1c.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable  median lower upper
  <dbl> <int> <chr>      <dbl> <dbl> <dbl>
1   2.5     1 .linpred  0.0370 -2.48  2.74
2   5       2 .linpred -0.181  -5.10  4.80

dat1c.linpred <- dat1c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    median_hdci()

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is -4.026
the fitted mean $y$ associated with an $x$ of 5 is NA
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is “control”, “medium” and “high”

dat2a.brm2 |> emmeans(~x)

 x       emmean lower.HPD upper.HPD
 control   20.9     16.99      25.1
 medium    20.0     16.08      24.2
 high      12.1      7.95      16.4

Point estimate displayed: median 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat2a.brm2 |>
    emmeans(~x) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 3 × 7
  x       .value .lower .upper .width .point .interval
  <fct>    <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 control   20.9  17.0    25.1   0.95 median hdci     
2 medium    20.0  16.3    24.4   0.95 median hdci     
3 high      12.1   7.95   16.4   0.95 median hdci

dat2a.emmeans <- dat2a.brm2 |>
    emmeans(~x) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of “control” is 20.875
the predicted (estimated) mean $y$ associated with an $x$ of “medium” is 20.007
the predicted (estimated) mean $y$ associated with an $x$ of “high” is 12.094
95% HPD intervals also given

dat2a.brm2 |>
  posterior_predict(newdata = data.frame(x =
          c("control", "medium", "high"))) |>
  summarise_draws(median, HDInterval::hdi)

# A tibble: 3 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       20.7 11.0   30.2
2 ...2       20.0 10.6   30.5
3 ...3       11.9  1.69  22.1

# Or for even more control and ability to add other summaries
dat2a.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 3 × 6
# Groups:   x, .row [3]
  x        .row variable    median lower upper
  <chr>   <int> <chr>        <dbl> <dbl> <dbl>
1 control     1 .prediction   21.0 11.2   31.6
2 high        3 .prediction   11.9  1.63  22.1
3 medium      2 .prediction   20.0  9.64  29.7

dat2a.pred <- dat2a.brm2 |>
    posterior_predict(newdata = data.frame(x = c("control", "medium", "high"))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of “control” is 20.847
the predicted (estimated) mean $y$ associated with an $x$ of “medium” is 20.094
the predicted (estimated) mean $y$ associated with an $x$ of “high” is 11.953
95% HPD intervals also given

dat2a.brm2 |>
    posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 3 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       20.9 17.0   25.1
2 ...2       20.0 16.1   24.2
3 ...3       12.1  7.95  16.4

# Or for even more control and ability to add other summaries
dat2a.brm2 |>
    add_epred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 3 × 6
# Groups:   x, .row [3]
  x        .row variable median lower upper
  <chr>   <int> <chr>     <dbl> <dbl> <dbl>
1 control     1 .epred     20.9 17.0   25.1
2 high        3 .epred     12.1  7.95  16.4
3 medium      2 .epred     20.0 16.1   24.2

dat2a.epred <- dat2a.brm2 |>
    posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of “control” is 20.875
the predicted (estimated) mean $y$ associated with an $x$ of “medium” is 20.007
the predicted (estimated) mean $y$ associated with an $x$ of “high” is 12.094
95% HPD intervals also given

dat2a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    median_hdci()

         y     ymin     ymax .width .point .interval
1 19.13564 9.279738 24.10563   0.95 median      hdci

# Or for even more control and ability to add other summaries
dat2a.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 3 × 6
# Groups:   x, .row [3]
  x        .row variable median lower upper
  <chr>   <int> <chr>     <dbl> <dbl> <dbl>
1 control     1 .linpred   20.9 17.0   25.1
2 high        3 .linpred   12.1  7.95  16.4
3 medium      2 .linpred   20.0 16.1   24.2

dat2a.linpred <- dat2a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    median_hdci()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of “control” is 9.28
the predicted (estimated) mean $y$ associated with an $x$ of “medium” is NA
the predicted (estimated) mean $y$ associated with an $x$ of “high” is NA
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is “control”, “medium” and “high”

dat2b.brm2 |> emmeans(~x)

 x       emmean lower.HPD upper.HPD
 control   20.3     15.72      24.4
 medium    18.7     14.34      22.7
 high      11.1      6.49      15.6

Point estimate displayed: median 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat2b.brm2 |>
    emmeans(~x) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 3 × 7
  x       .value .lower .upper .width .point .interval
  <fct>    <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 control   20.3  15.7    24.4   0.95 median hdci     
2 medium    18.7  14.6    23.0   0.95 median hdci     
3 high      11.1   6.66   15.8   0.95 median hdci

dat2b.emmeans <- dat2b.brm2 |>
    emmeans(~x) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of “control” is 20.291
the predicted (estimated) mean $y$ associated with an $x$ of “medium” is 18.701
the predicted (estimated) mean $y$ associated with an $x$ of “high” is 11.087
95% HPD intervals also given

dat2b.brm2 |>
  posterior_predict(newdata = data.frame(x =
          c("control", "medium", "high"))) |>
  summarise_draws(median, HDInterval::hdi)

# A tibble: 3 × 4
  variable median   lower upper
  <chr>     <dbl>   <dbl> <dbl>
1 ...1       20.5  8.88    29.3
2 ...2       18.7  8.95    29.0
3 ...3       11.1 -0.0221  21.3

# Or for even more control and ability to add other summaries
dat2b.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 3 × 6
# Groups:   x, .row [3]
  x        .row variable    median lower upper
  <chr>   <int> <chr>        <dbl> <dbl> <dbl>
1 control     1 .prediction   20.3  9.63  29.8
2 high        3 .prediction   11.0  1.45  21.4
3 medium      2 .prediction   18.8  8.78  28.8

dat2b.pred <- dat2b.brm2 |>
    posterior_predict(newdata = data.frame(x = c("control", "medium", "high"))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of “control” is 20.24
the predicted (estimated) mean $y$ associated with an $x$ of “medium” is 18.727
the predicted (estimated) mean $y$ associated with an $x$ of “high” is 11.054
95% HPD intervals also given

dat2b.brm2 |>
    posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 3 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       20.3 15.7   24.4
2 ...2       18.7 14.3   22.7
3 ...3       11.1  6.49  15.6

# Or for even more control and ability to add other summaries
dat2b.brm2 |>
    add_epred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 3 × 6
# Groups:   x, .row [3]
  x        .row variable median lower upper
  <chr>   <int> <chr>     <dbl> <dbl> <dbl>
1 control     1 .epred     20.3 15.7   24.4
2 high        3 .epred     11.1  6.49  15.6
3 medium      2 .epred     18.7 14.3   22.7

dat2b.epred <- dat2b.brm2 |>
    posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of “control” is 20.291
the predicted (estimated) mean $y$ associated with an $x$ of “medium” is 18.701
the predicted (estimated) mean $y$ associated with an $x$ of “high” is 11.087
95% HPD intervals also given

dat2b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    median_hdci()

         y     ymin     ymax .width .point .interval
1 18.03669 8.373304 23.72695   0.95 median      hdci

# Or for even more control and ability to add other summaries
dat2b.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 3 × 6
# Groups:   x, .row [3]
  x        .row variable median lower upper
  <chr>   <int> <chr>     <dbl> <dbl> <dbl>
1 control     1 .linpred   20.3 15.7   24.4
2 high        3 .linpred   11.1  6.49  15.6
3 medium      2 .linpred   18.7 14.3   22.7

dat2b.linpred <- dat2b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |>
    median_hdci()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of “control” is 8.373
the predicted (estimated) mean $y$ associated with an $x$ of “medium” is NA
the predicted (estimated) mean $y$ associated with an $x$ of “high” is NA
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat3a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response")

   x rate lower.HPD upper.HPD
 2.5 2.45      1.41      3.73
 5.0 5.79      4.07      7.56

Point estimate displayed: median 
Results are back-transformed from the log scale 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat3a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    mutate(.value = exp(.value)) |> 
    median_hdci()

# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5   2.45   1.41   3.73   0.95 median hdci     
2   5     5.79   4.07   7.56   0.95 median hdci

# OR with yet more control over the way posteriors are summarised
dat3a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    dplyr::select(-.chain) |> 
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 5
# Groups:   x [2]
      x variable median lower upper
  <dbl> <chr>     <dbl> <dbl> <dbl>
1   2.5 .value    0.894 0.391  1.36
2   5   .value    1.76  1.42   2.04

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 2.445
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 5.794
95% HPD intervals also given

dat3a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          2     0     6
2 ...2          6     0    10

# Or for even more control and ability to add other summaries
dat3a.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      2     0     6
2   5       2 .prediction      6     1    10

dat3a.pred <- dat3a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 2
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 6
95% HPD intervals also given

dat3a.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.45  1.41  3.73
2 ...2       5.79  4.07  7.56

# Or for even more control and ability to add other summaries
dat3a.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .epred     2.45  1.41  3.73
2   5       2 .epred     5.79  4.07  7.56

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 2.445
the fitted mean $y$ associated with an $x$ of 5 is 5.794
95% HPD intervals also given

dat3a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.45  1.41  3.73
2 ...2       5.79  4.07  7.56

# Or for even more control and ability to add other summaries
dat3a.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = exp(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .linpred   2.45  1.41  3.73
2   5       2 .linpred   5.79  4.07  7.56

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 2.445
the fitted mean $y$ associated with an $x$ of 5 is 5.794
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat3b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))

   x emmean lower.HPD upper.HPD
 2.5  0.901     0.377      1.37
 5.0  1.757     1.450      2.05

Point estimate displayed: median 
Results are given on the log (not the response) scale. 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat3b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5  0.901  0.388   1.38   0.95 median hdci     
2   5    1.76   1.45    2.06   0.95 median hdci

dat3b.emmeans <- dat3b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0.901
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 1.757
95% HPD intervals also given

dat3b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          2     0     6
2 ...2          6     1    10

# Or for even more control and ability to add other summaries
dat3b.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      2     0     6
2   5       2 .prediction      6     2    11

dat3b.pred <- dat3b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 2
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 6
95% HPD intervals also given

dat3b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.46  1.40  3.84
2 ...2       5.80  4.26  7.79

# Or for even more control and ability to add other summaries
dat3b.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .epred     2.46  1.40  3.84
2   5       2 .epred     5.80  4.26  7.79

dat3b.epred <- dat3b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 2.463
the fitted mean $y$ associated with an $x$ of 5 is 5.795
95% HPD intervals also given

dat3b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.46  1.40  3.84
2 ...2       5.80  4.26  7.79

# Or for even more control and ability to add other summaries
dat3b.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = exp(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .linpred   2.46  1.40  3.84
2   5       2 .linpred   5.80  4.26  7.79

dat3b.linpred <- dat3b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 2.463
the fitted mean $y$ associated with an $x$ of 5 is 5.795
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat3c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))

   x emmean lower.HPD upper.HPD
 2.5   0.91     0.421      1.40
 5.0   1.76     1.442      2.06

Point estimate displayed: median 
Results are given on the log (not the response) scale. 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat3c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5  0.910  0.421   1.40   0.95 median hdci     
2   5    1.76   1.44    2.06   0.95 median hdci

dat3c.emmeans <- dat3c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0.91
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 1.761
95% HPD intervals also given

dat3c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          2     0     6
2 ...2          6     1    11

# Or for even more control and ability to add other summaries
dat3c.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      2     0     6
2   5       2 .prediction      6     1    11

dat3c.pred <- dat3c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 2
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 6
95% HPD intervals also given

dat3c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.48  1.45  3.88
2 ...2       5.82  4.17  7.76

# Or for even more control and ability to add other summaries
dat3c.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .epred     2.48  1.45  3.88
2   5       2 .epred     5.82  4.17  7.76

dat3c.epred <- dat3c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 2.483
the fitted mean $y$ associated with an $x$ of 5 is 5.818
95% HPD intervals also given

dat3c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.48  1.45  3.88
2 ...2       5.82  4.17  7.76

# Or for even more control and ability to add other summaries
dat3c.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = exp(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .linpred   2.48  1.45  3.88
2   5       2 .linpred   5.82  4.17  7.76

dat3c.linpred <- dat3c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 2.483
the fitted mean $y$ associated with an $x$ of 5 is 5.818
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat4a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response")

   x prob lower.HPD upper.HPD
 2.5 2.95      1.65      4.86
 5.0 5.95      4.12      8.19

Point estimate displayed: median 
Results are back-transformed from the log scale 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat4a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    mutate(.value = exp(.value)) |> 
    median_hdci()

# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5   2.95   1.65   4.86   0.95 median hdci     
2   5     5.95   4.12   8.19   0.95 median hdci

# OR with yet more control over the way posteriors are summarised
dat4a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    dplyr::select(-.chain) |> 
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 5
# Groups:   x [2]
      x variable median lower upper
  <dbl> <chr>     <dbl> <dbl> <dbl>
1   2.5 .value     1.08 0.552  1.61
2   5   .value     1.78 1.47   2.15

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 2.947
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 5.948
95% HPD intervals also given

dat4a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          3     0     7
2 ...2          6     0    11

# Or for even more control and ability to add other summaries
dat4a.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      3     0     7
2   5       2 .prediction      6     1    12

dat4a.pred <- dat4a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 3
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 6
95% HPD intervals also given

dat4a.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.95  1.65  4.86
2 ...2       5.95  4.12  8.19

# Or for even more control and ability to add other summaries
dat4a.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .epred     2.95  1.65  4.86
2   5       2 .epred     5.95  4.12  8.19

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 2.947
the fitted mean $y$ associated with an $x$ of 5 is 5.948
95% HPD intervals also given

dat4a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.95  1.65  4.86
2 ...2       5.95  4.12  8.19

# Or for even more control and ability to add other summaries
dat4a.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = exp(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .linpred   2.95  1.65  4.86
2   5       2 .linpred   5.95  4.12  8.19

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 2.947
the fitted mean $y$ associated with an $x$ of 5 is 5.948
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat4b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))

   x emmean lower.HPD upper.HPD
 2.5   1.07      0.54      1.61
 5.0   1.78      1.44      2.09

Point estimate displayed: median 
Results are given on the log (not the response) scale. 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat4b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5   1.07  0.533   1.60   0.95 median hdci     
2   5     1.78  1.44    2.09   0.95 median hdci

dat4b.emmeans <- dat4b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 1.068
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 1.78
95% HPD intervals also given

dat4b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          3     0     7
2 ...2          6     1    12

# Or for even more control and ability to add other summaries
dat4b.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      3     0     7
2   5       2 .prediction      6     1    12

dat4b.pred <- dat4b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 3
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 6
95% HPD intervals also given

dat4b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.91  1.50  4.55
2 ...2       5.93  4.08  7.97

# Or for even more control and ability to add other summaries
dat4b.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .epred     2.91  1.50  4.55
2   5       2 .epred     5.93  4.08  7.97

dat4b.epred <- dat4b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 2.911
the fitted mean $y$ associated with an $x$ of 5 is 5.929
95% HPD intervals also given

dat4b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       2.91  1.50  4.55
2 ...2       5.93  4.08  7.97

# Or for even more control and ability to add other summaries
dat4b.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = exp(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .linpred   2.91  1.50  4.55
2   5       2 .linpred   5.93  4.08  7.97

dat4b.linpred <- dat4b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 2.911
the fitted mean $y$ associated with an $x$ of 5 is 5.929
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat4c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))

   x emmean lower.HPD upper.HPD
 2.5   1.10     0.557      1.60
 5.0   1.79     1.493      2.16

Point estimate displayed: median 
Results are given on the log (not the response) scale. 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat4c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5   1.10  0.557   1.60   0.95 median hdci     
2   5     1.79  1.48    2.15   0.95 median hdci

dat4c.emmeans <- dat4c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 1.099
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 1.793
95% HPD intervals also given

dat4c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          3     0     7
2 ...2          6     1    12

# Or for even more control and ability to add other summaries
dat4c.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      3     0     7
2   5       2 .prediction      6     1    12

dat4c.pred <- dat4c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 3
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 6
95% HPD intervals also given

dat4c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       3.00  1.69  4.82
2 ...2       6.01  4.11  8.12

# Or for even more control and ability to add other summaries
dat4c.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .epred     3.00  1.69  4.82
2   5       2 .epred     6.01  4.11  8.12

dat4c.epred <- dat4c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 3
the fitted mean $y$ associated with an $x$ of 5 is 6.006
95% HPD intervals also given

dat4c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1       3.00  1.69  4.82
2 ...2       6.01  4.11  8.12

# Or for even more control and ability to add other summaries
dat4c.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = exp(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median lower upper
  <dbl> <int> <chr>     <dbl> <dbl> <dbl>
1   2.5     1 .linpred   3.00  1.69  4.82
2   5       2 .linpred   6.01  4.11  8.12

dat4c.linpred <- dat4c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    exp() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 3
the fitted mean $y$ associated with an $x$ of 5 is 6.006
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat5a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response")

   x   prob lower.HPD upper.HPD
 2.5 0.0694  5.72e-06     0.403
 5.0 0.4991  1.65e-01     0.781

Point estimate displayed: median 
Results are back-transformed from the logit scale 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat5a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    mutate(.value = plogis(.value)) |> 
    median_hdci()

# A tibble: 2 × 7
      x .value     .lower .upper .width .point .interval
  <dbl>  <dbl>      <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5 0.0694 0.00000572  0.402   0.95 median hdci     
2   5   0.499  0.165       0.781   0.95 median hdci

# OR with yet more control over the way posteriors are summarised
dat5a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    dplyr::select(-.chain) |> 
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 5
# Groups:   x [2]
      x variable   median lower upper
  <dbl> <chr>       <dbl> <dbl> <dbl>
1   2.5 .value   -2.60    -6.09 0.387
2   5   .value   -0.00371 -1.52 1.36

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0.069
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 0.499
95% HPD intervals also given

dat5a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          0     0     1
2 ...2          0     0     1

# Or for even more control and ability to add other summaries
dat5a.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      0     0     1
2   5       2 .prediction      0     0     1

dat5a.pred <- dat5a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 1
95% HPD intervals also given

dat5a.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median      lower upper
  <chr>     <dbl>      <dbl> <dbl>
1 ...1     0.0694 0.00000572 0.403
2 ...2     0.499  0.165      0.781

# Or for even more control and ability to add other summaries
dat5a.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median      lower upper
  <dbl> <int> <chr>     <dbl>      <dbl> <dbl>
1   2.5     1 .epred   0.0694 0.00000572 0.403
2   5       2 .epred   0.499  0.165      0.781

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.069
the fitted mean $y$ associated with an $x$ of 5 is 0.499
95% HPD intervals also given

dat5a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median      lower upper
  <chr>     <dbl>      <dbl> <dbl>
1 ...1     0.0694 0.00000572 0.403
2 ...2     0.499  0.165      0.781

# Or for even more control and ability to add other summaries
dat5a.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = plogis(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median      lower upper
  <dbl> <int> <chr>     <dbl>      <dbl> <dbl>
1   2.5     1 .linpred 0.0694 0.00000572 0.403
2   5       2 .linpred 0.499  0.165      0.781

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.069
the fitted mean $y$ associated with an $x$ of 5 is 0.499
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat5b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))

   x  emmean lower.HPD upper.HPD
 2.5 -2.5653     -6.36     0.437
 5.0 -0.0417     -1.56     1.377

Point estimate displayed: median 
Results are given on the logit (not the response) scale. 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat5b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5 -2.57    -6.36  0.437   0.95 median hdci     
2   5   -0.0417  -1.57  1.37    0.95 median hdci

dat5b.emmeans <- dat5b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is -2.565
the predicted (estimated) mean $y$ associated with an $x$ of 5 is -0.042
95% HPD intervals also given

dat5b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          0     0     1
2 ...2          1     0     1

# Or for even more control and ability to add other summaries
dat5b.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      0     0     1
2   5       2 .prediction      0     0     1

dat5b.pred <- dat5b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 1
95% HPD intervals also given

dat5b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median      lower upper
  <chr>     <dbl>      <dbl> <dbl>
1 ...1     0.0714 0.00000331 0.402
2 ...2     0.490  0.174      0.799

# Or for even more control and ability to add other summaries
dat5b.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median      lower upper
  <dbl> <int> <chr>     <dbl>      <dbl> <dbl>
1   2.5     1 .epred   0.0714 0.00000331 0.402
2   5       2 .epred   0.490  0.174      0.799

dat5b.epred <- dat5b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.071
the fitted mean $y$ associated with an $x$ of 5 is 0.49
95% HPD intervals also given

dat5b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median      lower upper
  <chr>     <dbl>      <dbl> <dbl>
1 ...1     0.0714 0.00000331 0.402
2 ...2     0.490  0.174      0.799

# Or for even more control and ability to add other summaries
dat5b.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = plogis(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median      lower upper
  <dbl> <int> <chr>     <dbl>      <dbl> <dbl>
1   2.5     1 .linpred 0.0714 0.00000331 0.402
2   5       2 .linpred 0.490  0.174      0.799

dat5b.linpred <- dat5b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.071
the fitted mean $y$ associated with an $x$ of 5 is 0.49
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat5c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))

   x  emmean lower.HPD upper.HPD
 2.5 -1.4481     -4.37     0.737
 5.0  0.0908     -1.29     1.378

Point estimate displayed: median 
Results are given on the logit (not the response) scale. 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat5c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5 -1.45    -4.37  0.737   0.95 median hdci     
2   5    0.0908  -1.29  1.38    0.95 median hdci

dat5c.emmeans <- dat5c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is -1.448
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 0.091
95% HPD intervals also given

dat5c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          0     0     1
2 ...2          1     0     1

# Or for even more control and ability to add other summaries
dat5c.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable    median lower upper
  <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1 .prediction      0     0     1
2   5       2 .prediction      1     0     1

dat5c.pred <- dat5c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 1
95% HPD intervals also given

dat5c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median    lower upper
  <chr>     <dbl>    <dbl> <dbl>
1 ...1      0.190 0.000832 0.551
2 ...2      0.523 0.232    0.815

# Or for even more control and ability to add other summaries
dat5c.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median    lower upper
  <dbl> <int> <chr>     <dbl>    <dbl> <dbl>
1   2.5     1 .epred    0.190 0.000832 0.551
2   5       2 .epred    0.523 0.232    0.815

dat5c.epred <- dat5c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5))) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.19
the fitted mean $y$ associated with an $x$ of 5 is 0.523
95% HPD intervals also given

dat5c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median    lower upper
  <chr>     <dbl>    <dbl> <dbl>
1 ...1      0.190 0.000832 0.551
2 ...2      0.523 0.232    0.815

# Or for even more control and ability to add other summaries
dat5c.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |>
    mutate(.linpred = plogis(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 6
# Groups:   x, .row [2]
      x  .row variable median    lower upper
  <dbl> <int> <chr>     <dbl>    <dbl> <dbl>
1   2.5     1 .linpred  0.190 0.000832 0.551
2   5       2 .linpred  0.523 0.232    0.815

dat5c.linpred <- dat5c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.19
the fitted mean $y$ associated with an $x$ of 5 is 0.523
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat6a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response")

   x  prob lower.HPD upper.HPD
 2.5 0.166    0.0467     0.320
 5.0 0.495    0.3777     0.615

Point estimate displayed: median 
Results are back-transformed from the logit scale 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat6a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    mutate(.value = plogis(.value)) |> 
    median_hdci()

# A tibble: 2 × 7
      x .value .lower .upper .width .point .interval
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5  0.166 0.0472  0.321   0.95 median hdci     
2   5    0.495 0.378   0.615   0.95 median hdci

# OR with yet more control over the way posteriors are summarised
dat6a.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    dplyr::select(-.chain) |> 
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 5
# Groups:   x [2]
      x variable  median  lower  upper
  <dbl> <chr>      <dbl>  <dbl>  <dbl>
1   2.5 .value   -1.62   -2.66  -0.573
2   5   .value   -0.0187 -0.499  0.468

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0.166
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 0.495
95% HPD intervals also given

dat6a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          0     0     1
2 ...2          0     0     1

# Or for even more control and ability to add other summaries
dat6a.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable    median lower upper
  <dbl> <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1     1 .prediction      0     0     1
2   5       1     2 .prediction      0     0     1

dat6a.pred <- dat6a.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 1
95% HPD intervals also given

dat6a.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median  lower upper
  <chr>     <dbl>  <dbl> <dbl>
1 ...1      0.166 0.0467 0.320
2 ...2      0.495 0.378  0.615

# Or for even more control and ability to add other summaries
dat6a.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable median  lower upper
  <dbl> <dbl> <int> <chr>     <dbl>  <dbl> <dbl>
1   2.5     1     1 .epred    0.166 0.0467 0.320
2   5       1     2 .epred    0.495 0.378  0.615

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.166
the fitted mean $y$ associated with an $x$ of 5 is 0.495
95% HPD intervals also given

dat6a.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median  lower upper
  <chr>     <dbl>  <dbl> <dbl>
1 ...1      0.166 0.0467 0.320
2 ...2      0.495 0.378  0.615

# Or for even more control and ability to add other summaries
dat6a.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    mutate(.linpred = plogis(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable median  lower upper
  <dbl> <dbl> <int> <chr>     <dbl>  <dbl> <dbl>
1   2.5     1     1 .linpred  0.166 0.0467 0.320
2   5       1     2 .linpred  0.495 0.378  0.615

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.166
the fitted mean $y$ associated with an $x$ of 5 is 0.495
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat6b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))

   x  emmean lower.HPD upper.HPD
 2.5 -1.6604    -2.786    -0.671
 5.0 -0.0226    -0.537     0.478

Point estimate displayed: median 
Results are given on the logit (not the response) scale. 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat6b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 2 × 7
      x  .value .lower .upper .width .point .interval
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5 -1.66   -2.79  -0.671   0.95 median hdci     
2   5   -0.0226 -0.537  0.478   0.95 median hdci

dat6b.emmeans <- dat6b.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is -1.66
the predicted (estimated) mean $y$ associated with an $x$ of 5 is -0.023
95% HPD intervals also given

dat6b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          0     0     1
2 ...2          1     0     1

# Or for even more control and ability to add other summaries
dat6b.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable    median lower upper
  <dbl> <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1     1 .prediction      0     0     1
2   5       1     2 .prediction      0     0     1

dat6b.pred <- dat6b.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 0
95% HPD intervals also given

dat6b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median  lower upper
  <chr>     <dbl>  <dbl> <dbl>
1 ...1      0.160 0.0436 0.313
2 ...2      0.494 0.369  0.617

# Or for even more control and ability to add other summaries
dat6b.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable median  lower upper
  <dbl> <dbl> <int> <chr>     <dbl>  <dbl> <dbl>
1   2.5     1     1 .epred    0.160 0.0436 0.313
2   5       1     2 .epred    0.494 0.369  0.617

dat6b.epred <- dat6b.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.16
the fitted mean $y$ associated with an $x$ of 5 is 0.494
95% HPD intervals also given

dat6b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median  lower upper
  <chr>     <dbl>  <dbl> <dbl>
1 ...1      0.160 0.0436 0.313
2 ...2      0.494 0.369  0.617

# Or for even more control and ability to add other summaries
dat6b.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    mutate(.linpred = plogis(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable median  lower upper
  <dbl> <dbl> <int> <chr>     <dbl>  <dbl> <dbl>
1   2.5     1     1 .linpred  0.160 0.0436 0.313
2   5       1     2 .linpred  0.494 0.369  0.617

dat6b.linpred <- dat6b.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.16
the fitted mean $y$ associated with an $x$ of 5 is 0.494
95% HPD intervals also given

In each case, we will predict $y$ when $x$ is 2.5 and 5

dat6c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)))

   x  emmean lower.HPD upper.HPD
 2.5 -1.3090    -2.483    -0.302
 5.0  0.0093    -0.487     0.494

Point estimate displayed: median 
Results are given on the logit (not the response) scale. 
HPD interval probability: 0.95

# OR with more control over the way posteriors are summarised
dat6c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    median_hdci()

# A tibble: 2 × 7
      x   .value .lower .upper .width .point .interval
  <dbl>    <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1   2.5 -1.31    -2.47  -0.284   0.95 median hdci     
2   5    0.00930 -0.487  0.494   0.95 median hdci

dat6c.emmeans <- dat6c.brm2 |>
    emmeans(~x, at = list(x = c(2.5, 5))) |>
    as.data.frame()

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is -1.309
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 0.009
95% HPD intervals also given

dat6c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median lower upper
  <chr>     <dbl> <dbl> <dbl>
1 ...1          0     0     1
2 ...2          1     0     1

# Or for even more control and ability to add other summaries
dat6c.brm2 |>
    add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable    median lower upper
  <dbl> <dbl> <int> <chr>        <dbl> <dbl> <dbl>
1   2.5     1     1 .prediction      0     0     1
2   5       1     2 .prediction      0     0     1

dat6c.pred <- dat6c.brm2 |>
    posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is 0
the predicted (estimated) mean $y$ associated with an $x$ of 5 is 1
95% HPD intervals also given

dat6c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median  lower upper
  <chr>     <dbl>  <dbl> <dbl>
1 ...1      0.213 0.0671 0.400
2 ...2      0.502 0.380  0.621

# Or for even more control and ability to add other summaries
dat6c.brm2 |>
    add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable median  lower upper
  <dbl> <dbl> <int> <chr>     <dbl>  <dbl> <dbl>
1   2.5     1     1 .epred    0.213 0.0671 0.400
2   5       1     2 .epred    0.502 0.380  0.621

dat6c.epred <- dat6c.brm2 |>
    posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.213
the fitted mean $y$ associated with an $x$ of 5 is 0.502
95% HPD intervals also given

dat6c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

# A tibble: 2 × 4
  variable median  lower upper
  <chr>     <dbl>  <dbl> <dbl>
1 ...1      0.213 0.0671 0.400
2 ...2      0.502 0.380  0.621

# Or for even more control and ability to add other summaries
dat6c.brm2 |>
    add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    mutate(.linpred = plogis(.linpred)) |>
    dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df()
    summarise_draws(
      median,
      HDInterval::hdi
    )

# A tibble: 2 × 7
# Groups:   x, total, .row [2]
      x total  .row variable median  lower upper
  <dbl> <dbl> <int> <chr>     <dbl>  <dbl> <dbl>
1   2.5     1     1 .linpred  0.213 0.0671 0.400
2   5       1     2 .linpred  0.502 0.380  0.621

dat6c.linpred <- dat6c.brm2 |>
    posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |>
    plogis() |> 
    summarise_draws(median, HDInterval::hdi)

Conclusions:

the fitted mean $y$ associated with an $x$ of 2.5 is 0.213
the fitted mean $y$ associated with an $x$ of 5 is 0.502
95% HPD intervals also given

13 Further investigations

Since we have the entire posterior, we are able to make probability statements. We simply count up the number of MCMC sample draws that satisfy a condition (e.g represent a slope greater than 0) and then divide by the total number of MCMC samples.

Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:

a change in $x$ is associated with an increase in $y$
a doubling of $x$ (from 2.5 to 5) is associated with an increase in $y$ of > 50%

dat1a.brm2 |> hypothesis("x > 0")

Hypothesis Tests for class b:
  Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1    (x) > 0    -0.08      0.46    -0.84     0.66       0.74      0.42     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

dat1a_hyp <- dat1a.brm2 |>
    hypothesis("x > 0") |>
    _[["hypothesis"]]

Conclusions:

the parameter (b_x) minus 0 is -0.08
Evid.Ratio: the ratio of evidence for the hypothesis vs the evidence against it. In this case, the evidence ratio is 0.735 - Inf is because the divisor was 0 (no evidence against the hypothesis).
Post.Prob: the probability of the hypothesis is 0.424
there is very high evidence for this hypothesis

Alternatively, we could use gather_draws to achieve a similar outcome.

In the following, in addition to median and HPD intervals, we will calculate the probability that the slope (b_x) is greater than 0. To calculate such a probability, we could simply count up the number of posterior b_x values that are greater than zero and then divide by the total number of posterior b_x values. In R, we could do this as sum(b_x > 0)/length(b_x) (where b_x > 0 will return either a 1 for each case it is true and a 0 when it is false, and thus summing is like counting). Dividing a sum by its length equates to a mean and thus we can achieve the probability by calcualing the mean of b_x > 0.

dat1a.brm2 |>
  gather_draws(b_x) |>
  summarise_draws(median,
    HDInterval::hdi,
    P = ~mean(. > 0)
  )

# A tibble: 1 × 6
# Groups:   .variable [1]
  .variable variable  median  lower upper     P
  <chr>     <chr>      <dbl>  <dbl> <dbl> <dbl>
1 b_x       .value   -0.0803 -0.973 0.806 0.424

dat1a_hyp2 <- dat1a.brm2 |>
    gather_draws(b_x) |>
    summarise_draws(median,
        HDInterval::hdi,
        P = ~mean(. > 0)
    )

Tip

The summarise_draws() function expects a set of one or more summary or diagnostic functions (such as median etc). These can be supplied either as the name of the function (as in the case for median in the example above) or if more arguments or information is required by the function, the function can be written out in full. in this case, the function must be proceeded with a ~ and the variable is denoted a . (such as in P = ~mean(. > 0) above).

Conclusions:

the parameter (b_x) minus 0 is -0.08
P: the probability of the hypothesis is 0.735
there is very high evidence for this hypothesis

dat1a.brm2 |>
    emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    ungroup() |>
    group_by(.draw) |>
  summarise(ES = 100 * diff(.value) / .value[1]) |>
  hypothesis("ES > 50")

Hypothesis Tests for class :
     Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (ES)-(50) > 0    22.54   2159.24  -178.62   257.92       2.94      0.75     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

Conclusions:

the difference between the percentage change in estimated $y$ as $x$ increases from 2.5 to 5 and 50% is 22.54
the evidence ratio in support of the hypothesis that the percentage change exceeds 50% is 2.941
the probability that the change in $y$ exceeds 50% is 0.746
the evidence for such a change is very weak

dat1a.brm2 |>
  emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
  gather_emmeans_draws() |>
  ungroup() |>
  group_by(.draw) |>
  summarise(ES = 100 * diff(.value) / .value[1]) |>
  summarise_draws(
    mean, median,
    HDInterval::hdi,
    P = ~ mean(. > 50)
  )

# A tibble: 1 × 6
  variable  mean median lower upper     P
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl>
1 ES        72.5   84.5 -352.  502. 0.746

Warning: `as_data_frame()` was deprecated in tibble 2.0.0.
ℹ Please use `as_tibble()` (with slightly different semantics) to convert to a
  tibble, or `as.data.frame()` to convert to a data frame.

Conclusions:

the percentage change in estimated $y$ as $x$ increases from 2.5 to 5 is 72.54
the probability that the change in $y$ exceeds 50% is 0.746
the evidence for such a change is very weak

The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0).

The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is “practically equivalent” to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. Kruschke (2018) argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on Cohen (1988). Kruschke (2018) also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of “null-hypothesis” testing in a Bayesian framework.

if the HDI of the focal parameter falls completely outside the ROPE, there is strong evidence that there is an effect
if the HDI of the focal parameter falls completely inside the ROPE, there is strong evidence that there is not an effect
otherwise there is not clear evidence either way

Note

ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a “non-significant” result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart.

I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights.

## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat1a.brm2)

dat1a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI")

# Test for Practical Equivalence

  ROPE: [-0.09 0.09]

Parameter |        H0 | inside ROPE |       95% HDI
---------------------------------------------------
x         | Undecided |     15.88 % | [-0.97, 0.82]

dat1a.brm2 |>
    bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") |>
    plot()

Picking joint bandwidth of 0.0734

Conclusions:

the percentage of the HPD for the slope that is inside the ROPE is 0.159
there is strong evidence for an effect

OR using the rope function.

## Calculate ROPE range manually
dat1a.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")

# Proportion of samples inside the ROPE [-0.09, 0.09]:

Parameter | inside ROPE
-----------------------
x         |     15.87 %

dat1a.brm2 |>
    bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") |>
    plot()

The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect)

Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:

a change in $x$ is associated with an increase in $y$
a doubling of $x$ (from 2.5 to 5) is associated with an increase in $y$ of > 50%

dat1b.brm2 |> get_variables()

 [1] "b_Intercept"          "b_scalexscaleEQFALSE" "sigma"               
 [4] "Intercept"            "prior_Intercept"      "prior_b"             
 [7] "prior_sigma"          "lprior"               "lp__"                
[10] "accept_stat__"        "stepsize__"           "treedepth__"         
[13] "n_leapfrog__"         "divergent__"          "energy__"

dat1b.brm2 |> hypothesis("scalexscaleEQFALSE > 0")

Hypothesis Tests for class b:
                Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (scalexscaleEQFALSE) > 0    -0.09      0.46    -0.81     0.65       0.68
  Post.Prob Star
1      0.41     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

dat1b_hyp <- dat1b.brm2 |>
    hypothesis("scalexscaleEQFALSE > 0") |>
    _[["hypothesis"]]

Conclusions:

the parameter (b_scalexscaledEQFALSE) minus 0 is -0.089
Evid.Ratio: the ratio of evidence for the hypothesis vs the evidence against it. In this case, the evidence ratio is 0.683 - Inf is because the divisor was 0 (no evidence against the hypothesis).
Post.Prob: the probability of the hypothesis is 0.406
there is very high evidence for this hypothesis

Alternatively, we could use gather_draws to achieve a similar outcome.

dat1b.brm2 |>
  gather_draws(`b_.*x.*`, regex = TRUE) |>
  summarise_draws(median,
    HDInterval::hdi,
    P = ~mean(. > 0)
  )

# A tibble: 1 × 6
# Groups:   .variable [1]
  .variable            variable  median  lower upper     P
  <chr>                <chr>      <dbl>  <dbl> <dbl> <dbl>
1 b_scalexscaleEQFALSE .value   -0.0841 -0.997 0.803 0.406

dat1b_hyp2 <- dat1b.brm2 |>
    gather_draws(`b_.*x.*`, regex = TRUE) |>
    summarise_draws(median,
        HDInterval::hdi,
        P = ~mean(. > 0)
    )

Tip

Conclusions:

the parameter (b_scalexscaleEQFALSE) minus 0 is -0.084
P: the probability of the hypothesis is 0.683
there is very high evidence for this hypothesis

dat1b.brm2 |>
    emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    ungroup() |>
    group_by(.draw) |>
  summarise(ES = 100 * diff(.value) / .value[1]) |>
  hypothesis("ES > 50")

Hypothesis Tests for class :
     Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (ES)-(50) > 0   -34.88   1950.39  -205.63   207.74       2.85      0.74     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

Conclusions:

the difference between the percentage change in estimated $y$ as $x$ increases from 2.5 to 5 and 50% is -34.885
the evidence ratio in support of the hypothesis that the percentage change exceeds 50% is 2.852
the probability that the change in $y$ exceeds 50% is 0.74
the evidence for such a change is very weak

dat1b.brm2 |>
  emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
  gather_emmeans_draws() |>
  ungroup() |>
  group_by(.draw) |>
  summarise(ES = 100 * diff(.value) / .value[1]) |>
  summarise_draws(
    mean, median,
    HDInterval::hdi,
    P = ~ mean(. > 50)
  )

# A tibble: 1 × 6
  variable  mean median lower upper     P
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl>
1 ES        15.1   84.2 -430.  389. 0.740

Conclusions:

the percentage change in estimated $y$ as $x$ increases from 2.5 to 5 is 15.115
the probability that the change in $y$ exceeds 50% is 0.74
the evidence for such a change is very weak

if the HDI of the focal parameter falls completely outside the ROPE, there is strong evidence that there is an effect
if the HDI of the focal parameter falls completely inside the ROPE, there is strong evidence that there is not an effect
otherwise there is not clear evidence either way

Note

## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat1b.brm2)

dat1b.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI")

# Test for Practical Equivalence

  ROPE: [-0.09 0.09]

Parameter          |        H0 | inside ROPE |       95% HDI
------------------------------------------------------------
scalexscaleEQFALSE | Undecided |     18.33 % | [-0.97, 0.85]

dat1b.brm2 |>
    bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") |>
    plot()

Picking joint bandwidth of 0.0727

Conclusions:

the percentage of the HPD for the slope that is inside the ROPE is 0.183
there is strong evidence for an effect

OR using the rope function.

## Calculate ROPE range manually
dat1b.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")

# Proportion of samples inside the ROPE [-0.09, 0.09]:

Parameter          | inside ROPE
--------------------------------
scalexscaleEQFALSE |     18.33 %

dat1b.brm2 |>
    bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") |>
    plot()

Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:

a change in $x$ is associated with an increase in $y$
a doubling of $x$ (from 2.5 to 5) is associated with an increase in $y$ of > 50%

dat1c.brm2 |> get_variables()

 [1] "b_Intercept"     "b_scalex"        "sigma"           "Intercept"      
 [5] "prior_Intercept" "prior_b"         "prior_sigma"     "lprior"         
 [9] "lp__"            "accept_stat__"   "stepsize__"      "treedepth__"    
[13] "n_leapfrog__"    "divergent__"     "energy__"

dat1c.brm2 |> hypothesis("scalex > 0")

Hypothesis Tests for class b:
    Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (scalex) > 0    -0.07      0.41    -0.71     0.58       0.72      0.42     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

dat1c_hyp <- dat1c.brm2 |>
    hypothesis("scalex > 0") |>
    _[["hypothesis"]]

Conclusions:

the parameter (b_scalex) minus 0 is -0.069
Evid.Ratio: the ratio of evidence for the hypothesis vs the evidence against it. In this case, the evidence ratio is 0.72 - Inf is because the divisor was 0 (no evidence against the hypothesis).
Post.Prob: the probability of the hypothesis is 0.419
there is very high evidence for this hypothesis

Alternatively, we could use gather_draws to achieve a similar outcome.

dat1c.brm2 |>
  gather_draws(`b_.*x`, regex = TRUE) |>
  summarise_draws(median,
    HDInterval::hdi,
    P = ~mean(. > 0)
  )

# A tibble: 1 × 6
# Groups:   .variable [1]
  .variable variable  median  lower upper     P
  <chr>     <chr>      <dbl>  <dbl> <dbl> <dbl>
1 b_scalex  .value   -0.0750 -0.813 0.752 0.419

dat1c_hyp2 <- dat1c.brm2 |>
    gather_draws(`b_.*x`, regex = TRUE) |>
    summarise_draws(median,
        HDInterval::hdi,
        P = ~mean(. > 0)
    )

Tip

Conclusions:

the parameter (b_scalex) minus 0 is -0.075
P: the probability of the hypothesis is 0.72
there is very high evidence for this hypothesis

dat1c.brm2 |>
    emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
    gather_emmeans_draws() |>
    ungroup() |>
    group_by(.draw) |>
  summarise(ES = 100 * diff(.value) / .value[1]) |>
  hypothesis("ES > 50")

Hypothesis Tests for class :
     Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (ES)-(50) > 0  -214.24  10928.71  -188.13   241.73       3.02      0.75     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

Conclusions:

the difference between the percentage change in estimated $y$ as $x$ increases from 2.5 to 5 and 50% is -214.236
the evidence ratio in support of the hypothesis that the percentage change exceeds 50% is 3.02
the probability that the change in $y$ exceeds 50% is 0.751
the evidence for such a change is very weak

dat1c.brm2 |>
  emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |>
  gather_emmeans_draws() |>
  ungroup() |>
  group_by(.draw) |>
  summarise(ES = 100 * diff(.value) / .value[1]) |>
  summarise_draws(
    mean, median,
    HDInterval::hdi,
    P = ~ mean(. > 50)
  )

# A tibble: 1 × 6
  variable  mean median lower upper     P
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl>
1 ES       -164.   84.8 -312.  498. 0.751

Conclusions:

the percentage change in estimated $y$ as $x$ increases from 2.5 to 5 is -164.236
the probability that the change in $y$ exceeds 50% is 0.751
the evidence for such a change is very weak

if the HDI of the focal parameter falls completely outside the ROPE, there is strong evidence that there is an effect
if the HDI of the focal parameter falls completely inside the ROPE, there is strong evidence that there is not an effect
otherwise there is not clear evidence either way

Note

## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat1c.brm2)

dat1c.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI")

# Test for Practical Equivalence

  ROPE: [-0.09 0.09]

Parameter |        H0 | inside ROPE |       95% HDI
---------------------------------------------------
scalex    | Undecided |     19.43 % | [-0.87, 0.71]

dat1c.brm2 |>
    bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") |>
    plot()

Picking joint bandwidth of 0.0637

Conclusions:

the percentage of the HPD for the slope that is inside the ROPE is 0.194
there is strong evidence for an effect

OR using the rope function.

## Calculate ROPE range manually
dat1c.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")

# Proportion of samples inside the ROPE [-0.09, 0.09]:

Parameter | inside ROPE
-----------------------
scalex    |     19.42 %

dat1c.brm2 |>
    bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") |>
    plot()

Treatment contrasts

Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses:

all pairwise comparisons (compare each level of $x$ to each other
define a specific set of contrasts that include comparing the average of medium and high treatments to the control treatment.

dat2a.brm2 |>
    emmeans(~x) |>
    pairs()

 contrast         estimate lower.HPD upper.HPD
 control - medium    0.889     -4.89      6.44
 control - high      8.726      2.38     14.76
 medium - high       7.980      1.84     14.10

Point estimate displayed: median 
HPD interval probability: 0.95

Or if we want the full posteriors… This option allows us to calculate exceedence probabilities. That is, we can calculate the proportion of contrast posteriors that exceed a specific value (as a hypothesis). In this case, we will calculate two exceedence probabilities:

probability that the effect is negative (e.g. proportion of probabilities that are less than 0)
probability that the effect is positive (e.g. proportion of probabilities that are greater than 0)

dat2a.brm2 |>
    emmeans(~x) |>
    pairs() |>
    gather_emmeans_draws() |>
    dplyr::select(-.chain) |> 
    summarise_draws(median,
        HDInterval::hdi,
        Pl = ~ mean(.x < 0),
        Pg = ~ mean(.x > 0)
    )

# A tibble: 3 × 7
# Groups:   contrast [3]
  contrast         variable median lower upper      Pl    Pg
  <chr>            <chr>     <dbl> <dbl> <dbl>   <dbl> <dbl>
1 control - high   .value    8.73   2.38 14.8  0.00417 0.996
2 control - medium .value    0.889 -4.89  6.44 0.373   0.627
3 medium - high    .value    7.98   1.84 14.1  0.0075  0.992

Conclusions:

the difference in $y$ between “control” and “medium” is 0.89, however there is no evidence of this effect (exceedence probability)
the difference in $y$ between “control” and “high” is 8.73, and there is very strong evidence for this effect
the difference in $y$ between “medium” and “high” is 7.98, and there is very strong evidence for this effect

It is also possible to express the magnitude of effect in percentage change. The trick is to put the emmeans parameters onto a logarithmic scale so that the pairwise comparisons (which are a subtraction) effectively are treated as divisions (due to log laws).

dat2a.brm2 |>
    emmeans(~x) |>
    regrid(transform = "log") |>
    pairs() |>
    regrid()

 contrast       ratio lower.HPD upper.HPD
 control/medium  1.04     0.755      1.34
 control/high    1.72     1.078      2.55
 medium/high     1.66     0.996      2.43

Point estimate displayed: median 
HPD interval probability: 0.95

The estimates are expressed as fractional changes. A “ratio” of 1 indicates parity, since if you multiply something by 1, it does not change. A value of 1.5, indicates a 50% increase and a value of 0.5 indicates a 50% decline.

To calculate percentage change from a fractional value, subtract 1 and multiply the result by 100. E.g.

(1.19 - 1) * 100

[1] 19

If we get the full posteriors, we can also explore whether the change exceeds some ecologically important change (such as 20%)

dat2a.brm2 |>
  emmeans(~x) |>
  regrid(transform = "log") |>
  pairs() |>
  regrid() |> 
  gather_emmeans_draws() |>
  dplyr::select(-.chain) |>
  summarise_draws(median,
    HDInterval::hdi,
    Pl = ~ mean(.x < 1),
    Pg = ~ mean(.x > 1),
    Pl50 = ~ mean(.x < 0.8),
    Pg50 = ~ mean(.x > 1.2)
  )

# A tibble: 3 × 9
# Groups:   contrast [3]
  contrast       variable median lower upper      Pl    Pg     Pl50  Pg50
  <chr>          <chr>     <dbl> <dbl> <dbl>   <dbl> <dbl>    <dbl> <dbl>
1 control/high   .value     1.72 1.08   2.55 0.00417 0.996 0.000417 0.962
2 control/medium .value     1.04 0.755  1.34 0.373   0.627 0.0383   0.145
3 medium/high    .value     1.66 0.996  2.43 0.0075  0.992 0.000417 0.944

Conclusions:

the response $y$ is 72.25% higher in the control group over the high group.
there is strong evidence (P = 1) of the above
there is also strong evidence (P = 1) that $y$ is at least 20% higher in the control group over the high group
the response $y$ is 4.378% higher in the control group over the medium group.
there is no evidence (P = 0.627) of the above
there is no evidence (P = 0.145) that $y$ is at least 20% higher in the control group over the medium group
the response $y$ is 65.861% higher in the medium group over the high group.
there is strong evidence (P = 0.993) of the above
there is no evidence (P = 0.944) that $y$ is at least 20% higher in the medium group over the high group

cmat <- cbind(
  "Contrast vs Medium/High" = c(1, -0.5, -0.5),
  "Medium vs High" = c(0, 1, -1)
)
dat2a.brm2 |>
    emmeans(~x) |>
    contrast(method = list(x = cmat))

 contrast                  estimate lower.HPD upper.HPD
 x.Contrast vs Medium/High     4.79    -0.215      9.82
 x.Medium vs High              7.98     1.840     14.10

Point estimate displayed: median 
HPD interval probability: 0.95

Or with full posteriors and exceedance probabilities…

dat2a.brm2 |>
    emmeans(~x) |>
    contrast(method = list(x = cmat)) |> 
    gather_emmeans_draws() |>
    dplyr::select(-.chain) |> 
    summarise_draws(median,
        HDInterval::hdi,
        Pl = ~ mean(.x < 0),
        Pg = ~ mean(.x > 0)
    )

# A tibble: 2 × 7
# Groups:   contrast [2]
  contrast                  variable median  lower upper     Pl    Pg
  <chr>                     <chr>     <dbl>  <dbl> <dbl>  <dbl> <dbl>
1 x.Contrast vs Medium/High .value     4.79 -0.215  9.82 0.035  0.965
2 x.Medium vs High          .value     7.98  1.84  14.1  0.0075 0.992

Conclusions:

on average, $y$ is 4.79 higher in the “control” group than the “mediun” and “high” groups.
the evidence for this effect is very strong 0.965

We have already seen that there is no evidence of a difference in $y$ between the “control” and “medium” groups. This could be because either there is not enough power to detect the difference or that the populations are not different. It would be nice to be able to gain some insights into which of these is most likely. And we can. If we establish the range of values that represent an insubstantial effect, we can then quantify the proportion of the posterior that falls inside this Region of Practical Equivalence (ROPE).

Conventionally, the ROPE represents within 10% - that is, if the effect is less than 10% change, then we might consider it insubstantial.

## Calculate ROPE range manually
ROPE <- c(-0.1, 0.1) * with(dat2, sd(y))
## OR calculate ROPE range via rope_range function
ROPE <- bayestestR::rope_range(dat2a.brm2)

dat2a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI")

# Test for Practical Equivalence

  ROPE: [-0.60 0.60]

Parameter |        H0 | inside ROPE |         95% HDI
-----------------------------------------------------
xmedium   | Undecided |     17.50 % |   [-6.31, 5.12]
xhigh     |  Rejected |      0.00 % | [-14.74, -2.34]

dat2a.brm2 |>
    bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") |>
    plot()

Picking joint bandwidth of 0.484

Conclusions:

there is insufficient evidence to conclude that there is a difference in $y$ between “control” and “medium” groups
we cannot conclude that there is evidence of no effect

## Calculate ROPE range manually
dat2a.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI")

# Proportion of samples inside the ROPE [-0.60, 0.60]:

Parameter | inside ROPE
-----------------------
xmedium   |     17.49 %
xhigh     |      0.00 %

dat2a.brm2 |>
    bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") |>
    plot()

14 Summary plots

Partial plot

dat1a.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1a.brm2 |>
    emmeans(~x, at = dat1a.grid) |>
    as.data.frame() |>
    ggplot(aes(y = emmean, x = x)) +
    geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) +
    geom_point(data = dat, aes(y = y)) +
    geom_line() +
    theme_classic()

As a spaghetti plot

dat1a.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1a.brm2 |>
    emmeans(~x, at = dat1a.grid) |>
    gather_emmeans_draws() |> 
    ggplot(aes(y = .value, x = x)) +
    geom_line(aes(group=.draw),  colour = 'orange', alpha=0.01) +
    geom_point(data = dat, aes(y = y)) +
    theme_classic()

Partial plot

dat1b.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1b.brm2 |>
    emmeans(~x, at = dat1b.grid) |>
    as.data.frame() |>
    ggplot(aes(y = emmean, x = x)) +
    geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) +
    geom_point(data = dat, aes(y = y)) +
    geom_line() +
    theme_classic()

As a spaghetti plot

dat1b.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1b.brm2 |>
    emmeans(~x, at = dat1b.grid) |>
    gather_emmeans_draws() |> 
    ggplot(aes(y = .value, x = x)) +
    geom_line(aes(group=.draw),  colour = 'orange', alpha=0.01) +
    geom_point(data = dat, aes(y = y)) +
    theme_classic()

Partial plot

dat1c.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1c.brm2 |>
    emmeans(~x, at = dat1c.grid) |>
    as.data.frame() |>
    ggplot(aes(y = emmean, x = x)) +
    geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) +
    geom_point(data = dat, aes(y = y)) +
    geom_line() +
    theme_classic()

As a spaghetti plot

dat1c.grid <- list(x = modelr::seq_range(dat$x, n = 100))
dat1c.brm2 |>
    emmeans(~x, at = dat1c.grid) |>
    gather_emmeans_draws() |> 
    ggplot(aes(y = .value, x = x)) +
    geom_line(aes(group=.draw),  colour = 'orange', alpha=0.01) +
    geom_point(data = dat, aes(y = y)) +
    theme_classic()

Treatment contrasts

Partial plot

dat2a.brm2 |>
  emmeans(~x) |>
  as.data.frame() |>
  ggplot(aes(y = emmean, x = x)) +
  geom_pointrange(aes(ymin = lower.HPD, ymax = upper.HPD)) +
  theme_classic()

15 References

Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates.

Kruschke, John K. 2018. “Rejecting or Accepting Parameter Values in Bayesian Estimation.” Advances in Methods and Practices in Psychological Science 1 (2): 270–80. https://doi.org/10.1177/2515245918771304.

Wade, Paul R. 2000. “Bayesian Methods in Conservation Biology.” Conservation Biology 14 (5): 1308–16. https://doi.org/https://doi.org/10.1046/j.1523-1739.2000.99415.x.

--- title: Bayesian generalised linear models author: "Murray Logan" date: "`r format(Sys.time(), '%d %B, %Y')`" format: html: toc: true toc-float: true page-layout: full number-sections: true number-depth: 3 embed-resources: true code-fold: false code-tools: true code-summary: "Show the code" code-line-numbers: true code-block-border-left: "#ccc" code-copy: true highlight-style: atom-one theme: [default, ../resources/tut-style.scss] css: ../resources/tut-style.css crossref: fig-title: '**Figure**' fig-labels: arabic tbl-title: '**Table**' tbl-labels: arabic engine: knitr bibliography: ../resources/references.bib output_dir: "docs" --- ```{r setup, include=FALSE,warning=FALSE, message=FALSE} options(tinytex.engine = 'xelatex') #knitr::read_chunk("components/35_bayesian_glm_example1.R") ``` # Preparations Load the necessary libraries ```{r} #| label: libraries #| eval: true #| warning: false #| message: false #| cache: false library(tidyverse) #for data wrangling and plotting library(DHARMa) #for simulated residuals library(performance) #for model diagnostics library(see) #for model diagnostics library(brms) #for Bayesian models library(tidybayes) #for exploring Bayesian PKPDmodels library(rstan) #for diagnostics plots library(bayesplot) #for diagnostic plots library(patchwork) #for arranging multiple plots library(gridGraphics)#for arranging multiple plots - needed for some patchwork plots library(HDInterval) #for HPD intervals library(bayestestR) #for ROPE library(emmeans) #for estimated marginal means library(standist) #for plotting distributions library(cmdstanr) #for the backend source("helperfunctions.R") ``` Many biologists and ecologists get a little twitchy and nervous around mathematical and statistical formulae and nomenclature. Whilst it is possible to perform basic statistics without too much regard for the actual equation (model) being employed, as the complexity of the analysis increases, the need to understand the underlying model becomes increasingly important. Moreover, model specification in BUGS (the language used to program Bayesian modelling) aligns very closely to the underlying formulae. Hence a good understanding of the underlying model is vital to be able to create a sensible Bayesian model. Consequently, I will always present the linear model formulae along with the analysis. If you start to feel some form of disorder starting to develop, you might like to run through the Tutorials and Workshops twice (the first time ignoring the formulae). ::: {.callout-note} This tutorial will introduce the concept of Bayesian (generalised) linear models and demonstrate how to fit simple models to a set of simple fabricated data sets, each representing major data types encountered in ecological research. Subsequent tutorials will build on these fundamentals with increasingly more complex data and models. ::: # A philosophical note To introduce the philosophical and mathematical differences between classical (frequentist) and Bayesian statistics, @Wade-2000 presented a provocative yet compelling trend analysis of two hypothetical populations. The temporal trend of one of the populations shows very little variability from a very subtle linear decline. By contrast, the second population appears to decline more dramatically, yet has substantially more variability. @Wade-2000 neatly illustrates the contrasting conclusions (particularly with respect to interpreting probability) that would be drawn by the frequentist and Bayesian approaches and in so doing highlights how and why the Bayesian approach provides outcomes that are more aligned with management requirements. This tutorial will start by replicating the demonstration of @Wade-2000. ```{r} #| label: Pops #| eval: true #| echo: false #| fig-width: 10 #| fig-height: 4 #| tidy: true set.seed(1) x <- 1:10 b <- -0.03 a <- 200 sigma <- 0.11 mm <- model.matrix(~x) y <- (mm %*% c(a,b))+rnorm(length(x),0,sigma) #plot(y~x, ylim=c(0,150)) #summary(lm(y~x)) popA <- data.frame(y,x) set.seed(1) x <- 1:10 b <- -0.1 a <- 200 sigma <- 0.38 mm <- model.matrix(~x) y <- (mm %*% c(a,b))+rnorm(length(x),0,sigma) #plot(y~x, ylim=c(0,150)) #summary(lm(y~x)) popA <- data.frame(y,x) set.seed(1) x <- 1:10 b <- -10 a <- 200 sigma <- 40 mm <- model.matrix(~x) y <- (mm %*% c(a,b))+rnorm(length(x),0,sigma) #plot(y~x) #summary(lm(y~x)) popB <- data.frame(y,x) #plot(y~x) set.seed(1) x <- seq(1,10,l=100) b <- -10 a <- 200 sigma <- 40 mm <- model.matrix(~x) y <- (mm %*% c(a,b))+rnorm(length(x),0,sigma) #plot(y~x, ylim=c(0,150)) #summary(lm(y~x)) popC <- data.frame(y,x) #plot(y~x) #par(mar=c(3,3,0,0),mfrow=c(1,3)) #plot(y~x,popA, ylim=c(0,250)) #abline(lm(y~x,popA)) #plot(y~x,popB, ylim=c(0,250)) #abline(lm(y~x,popB)) #plot(y~x,popC, ylim=c(0,250)) #abline(lm(y~x,popC)) ##THIS IS BETTER for (n in 1:1000) { set.seed(n) x <- 1:10 b <- -0.1 a <- 200 sigma <- 0.38 mm <- model.matrix(~x) y <- (mm %*% c(a,b))+rnorm(length(x),0,sigma) (popA <- data.frame(y,x)) set.seed(n) x <- 1:10 b <- -10 a <- 200 sigma <- 40 mm <- model.matrix(~x) y <- (mm %*% c(a,b))+rnorm(length(x),0,sigma) (popB <- data.frame(y,x)) set.seed(n) x <- seq(1,10,l=100) b <- -10 a <- 200 sigma <- 40 mm <- model.matrix(~x) y <- (mm %*% c(a,b))+rnorm(length(x),0,sigma) popC <- data.frame(y,x) aa<-popA.lm <- lm(y~x,popA) bb<-popB.lm <- lm(y~x,popB) cc<-popC.lm <- lm(y~x,popC) AA<-summary(popA.lm)$coef BB<-summary(popB.lm)$coef CC<-summary(popC.lm)$coef if(AA[2,4]<0.05 & BB[2,4]>0.05 & abs(BB[2,1]-CC[2,1])<1) break } ``` ::: {.columns} :::: {.column width="32%"} ```{r} #| label: PopA #| eval: true #| echo: false #| fig-width: 3 #| fig-height: 3 #| tidy: true par(mar=c(3,3,0,0)) plot(y~x,popA, ylim=c(0,250), axes=F, ann=F, pch=16) abline(lm(y~x,popA)) axis(1) axis(2, las=1) mtext("x",1,line=2, cex=1.5) box(bty="l") popA.lm <- lm(y~x,popA) popA.sum<-summary(popA.lm)$coef ``` n: `r nrow(popA)` Slope: `r round(coef(popA.lm)[2],4)` _t_: `r round(popA.sum[2,3],4)` _p-value_: `r round(popA.sum[2,4],4)` :::: :::: {.column width="32%"} ```{r} #| label: PopB #| eval: true #| echo: false #| fig-width: 3 #| fig-height: 3 #| tidy: true par(mar=c(3,3,0,0)) plot(y~x,popB, ylim=c(0,250), axes=F, ann=F, pch=16) abline(lm(y~x,popB)) axis(1) axis(2, las=1) mtext("x",1,line=2, cex=1.5) box(bty="l") popB.lm <- lm(y~x,popB) popB.sum<-summary(popB.lm)$coef ``` n: `r nrow(popB)` Slope: `r round(coef(popB.lm)[2],4)` _t_: `r round(popB.sum[2,3],4)` _p-value_: `r round(popB.sum[2,4],4)` :::: :::: {.column width="32%"} ```{r} #| label: PopC #| eval: true #| echo: false #| fig-width: 3 #| fig-height: 3 par(mar=c(3,3,0,0)) plot(y~x,popC, ylim=c(0,250), axes=F, ann=F, pch=16) abline(lm(y~x,popC)) axis(1) axis(2, las=1) mtext("x",1,line=2, cex=1.5) box(bty="l") popC.lm <- lm(y~x,popC) popC.sum<-summary(popC.lm)$coef ``` n: `r nrow(popC)` Slope: `r round(coef(popC.lm)[2],4)` _t_: `r round(popC.sum[2,3],4)` _p-value_: `r round(popC.sum[2,4],4)` :::: ::: From a traditional frequentist perspective, we would conclude that there is a 'significant' relationship in Population A and C ($p<0.05$), yet not in Population B ($p>0.05$). Note, Population B and C were both generated from the same random distribution, it is just that Population C has a substantially higher number of observations. The above illustrates a couple of things - statistical significance does not necessarily translate into biological importance. The percentage of decline for Population A is `r p1 <- predict(popA.lm, data.frame( x=c(1,10))); round(100*(p1[1]-p1[2])/p1[1],2)` where as the percentage of decline for Population B is `r p1 <- predict(popB.lm, data.frame(x=c(1,10))); round(100*(p1[1]-p1[2])/p1[1],2)`. That is Population B is declining at nearly 10 times the rate of Population A. That sounds rather important, yet on the basis of the hypothesis test, we would dismiss the decline in Population B. - that a _p_-value is just the probability of detecting an effect or relationship - what is the probability that the sample size is large enough to pick up a difference. Let us now look at it from a Bayesian perspective. I will just provide the posterior distributions (densities scaled to 0-1 so that they can be plotted together) for the slope for each population. ```{r} #| label: Pop_densities #| eval: true #| echo: false #| fig-width: 9 #| fig-height: 4 #| warning: false #| message: false #| results: hide #| cache: true ## PopA popA.brm <- brm(y ~ x, data = popA) slopeA <- popA.brm |> as_draws_df() |> pull("b_x") ## PopB popB.brm <- brm(y ~ x, data = popB) slopeB <- popB.brm |> as_draws_df() |> pull("b_x") ## PopC popC.brm <- brm(y ~ x, data = popC) slopeC <- popC.brm |> as_draws_df() |> pull("b_x") ggplot() + geom_density(data = NULL, aes(x = slopeA, y = ..scaled..)) + geom_density(data = NULL, aes(x = slopeB, y = ..scaled..)) + geom_density(data = NULL, aes(x = slopeC, y = ..scaled..)) + scale_y_continuous("Relative probability", limits = c(0, 1), expand = c(0, 0)) + scale_x_continuous("Regression slope") + theme_classic() + theme( axis.text.y = element_blank(), axis.ticks.y = element_blank() ) ``` Focusing on Populations A and B, we would conclude: - the mean (plus or minus CI) slopes for Population A and B are `r round(mean(slopeA),2)` (`r paste(round(quantile(slopeA, p=c(0.025,0.975)),2),collapse=",")`) and `r round(mean(slopeB),2)` (`r paste(round(quantile(slopeB, p=c(0.025,0.975)),2),collapse=",")`) respectively. - the Bayesian approach allows us to query the posterior distribution is many other ways in order to ask sensible biological questions. For example, we might consider that a rate of change of 5% or greater represents an important biological impact. For Population A and B, the probability that the rate is 5% or greater is `r round(mean(slopeA< -5),2)` and `r round(mean(slopeB < -5),2)` respectively. # Review of (generalised) linear models I would highly recommend reviewing the information in [the tutorial on generalised linear models](30_glm.html), particularly the sections describe linear models, assumption checking and generalised linear models (GLM). Whilst there are philosophical differences between frequentist and Bayesian statistics that have implications for how models are fit and interpreted, model choice and assumption checking principles are common between the two approaches. Hence, many of these topics will be assumed, and not fully described in the current tutorial. Recall from [the tutorial on generalised linear models](30_glm.html) that simple linear regression is a linear modelling process that models a single response variable against one or more predictors with a linear combination of coefficients and (in the case of a Gaussian model) can be expressed as: ::: {.columns} :::: {.column width="48%"} $$y_i = \beta_0+ \beta_1 x_i+\epsilon_i \hspace{1cm}\epsilon\sim{}N(0,\sigma^2)$$ where: - $y_i$ is the response value for each of the $i$ observations - $\beta_0$ is the y-intercept (value of $y$ when $x=0$) - $\beta_1$ is the slope (rate of chance in $y$ per unit chance in $x$) - $x_i$ is the predictor value for each of the $i$ observations - $\epsilon_i$ is the residual value of each of the $i$ observations. A residual is the difference between the observed value and the value expected by the model. - $\epsilon\sim{}N(0,\sigma^2)$ indicates that the residuals are normally distributed with a constant amount of variance :::: :::: {.column width="48%"} ```{r} #| label: residuals, #| eval: true #| echo: false #| fig-height: 3 #| fig-width: 3 #| warning: false set.seed(234) x=1:10 y=2+5*x y1=2+5*x + rnorm(10,0,4) lm1 <- lm(y1 ~ x) cc <- coef(lm1) r <- resid(lm1) #print(r) ff <- fitted(lm1) #print(ff) g2=ggplot(data=NULL, aes(y=y1, x=x)) + geom_segment(data = NULL, aes(y = ff, x = x, yend = y1, xend = x), colour = "red") + geom_point() + geom_abline(slope = cc[2], intercept = cc[1], colour = "blue") + #annotate(geom='text', y=40, x=2, label=expression(y[i]==2+5*x[i]+epsilon[i]), parse=TRUE, hjust=0) + annotate(geom='text', y=30, x=2.5, label=expression(residual~(epsilon[i])), parse=TRUE, hjust=1) + annotate(geom='text', y=15, x=5, label=expression(slope~(beta[1])), parse=TRUE, hjust=0) + annotate(geom='text', y=-5, x=1, label=expression(y-intercept~(beta[0])), parse=TRUE, hjust=0) + geom_curve(aes(x = 2, y = 27, xend = 3.8, yend = 23), curvature = 0.3, # Adjust curvature as needed arrow = arrow(length = unit(0.05, "inches"))) + geom_curve(aes(x = 4.7, y = 15, xend = 3.7, yend = 18), curvature = -0.3, # Adjust curvature as needed arrow = arrow(length = unit(0.05, "inches"))) + geom_curve(aes(x = 0.7, y = -5, xend = -0.5, yend = -0.5), curvature = -0.3, # Adjust curvature as needed arrow = arrow(length = unit(0.05, "inches"))) + scale_y_continuous('y', limits = c(-10, 50)) + xlim(-2, 10) + theme_classic() + theme(axis.text = element_blank(), axis.ticks = element_blank()) g2 ``` :::: ::: The above can be re-expressed and generalised as: $$ \begin{align} y_i&\sim{}Dist(\mu_i, ...) \\ g(\mu_i) &= \beta_0+ \beta_1 x_i \end{align} $$ where: - $Dist$ represents a distribution from the exponential family (such as Gaussian, Poisson, Binomial, etc) - $...$ represents additional parameters relevant to the nominated distribution (such as $\sigma^2$: Gaussian, $n$: Binomial and $\phi$: Negative Binomial, etc) - $g()$ represents the link function (e.g. log: Poisson, logit: Binomial, etc) The reliability of any model depends on the degree to which the data adheres to the model assumptions. Hence, as with frequentist models, exploratory data analysis (EDA) is a vital component of Bayesian modelling and since the model structures are similar between frequentist and Bayesian approaches, so too is EDA. # Bayesian (generalised) linear models For the purpose of introduction, we will start by exploring a Gaussian model with a very simple fabricated data set representing the relationship between a response ($y$) and a continuous predictor ($x = [1,2,3,4,5,6,7,8,9,10]$. The fabricated data set will comprise 10 observations each drawn from normal distributions with a set standard deviation of 4. The means of the 10 populations will be determined by the following equation: $$ \mu_i = 2 + 5\times x_i $$ Let generate these data. ```{r} #| label: sim1 #| eval: true #| echo: true #| fig-height: 3 #| fig-width: 3 #| warning: false set.seed(234) dat <- data.frame(x = 1:10) |> mutate(y = round(rnorm(n = 10, mean = 2 + (5 * x), sd = 4), digits = 2)) dat ``` The model will will be fitting will be: $$ \begin{align} y_i&\sim{}N(\mu_i, \sigma^2)\\ \mu_i &= \beta_0+ \beta_1 x_i \end{align} $$ The parameters that we are going to attempt to estimate are the y-intercept ($\beta_0$), the slope ($\beta_1$) and the underlying variance ($\sigma^2$). Recall (from tutorials on [statistical philosophies](21_statistical_philosophies.html) and [estimation](22_estimation.html))) that Bayesian models calculate **posterior** predictions ($P(H|D)$) from likelihood ($P(D|H)$) and **prior** expectations ($P(H)$). Therefore, in preparation for fitting a Bayesian model, we must consider what our prior expectations are for _all_ parameters. The individual responses ($y_i$, observed yields) are each expected to have been **independently** drawn from normal (**Gaussian**) distributions ($\mathcal{N}$). These distributions represent all the possible values of $y$ we could have obtained at the specific ($i^{th}$) level of $x$. Hence the $i^{th}$ $y$ observation is expected to have been drawn from a normal distribution with a mean of $\mu_i$. Although each distribution is expected to come from populations that differ in their means, we assume that all of these distributions have the **same variance** ($\sigma^2$). ## Priors We need to supply priors for each of the parameters to be estimated ($\beta_0$, $\beta_1$ and $\sigma$). Whilst we want these priors to be sufficiently vague as to not influence the outcomes of the analysis (and thus be equivalent to the frequentist analysis), we do not want the priors to be so vague (wide) that they permit the MCMC sampler to drift off into parameter space that is both illogical as well as numerically awkward. Proffering sensible priors is one of the most difficult aspects of performing Bayesian analyses. For instances where there are some previous knowledge available and a desire to incorporate those data, the difficulty is in how to ensure that the information is incorporated correctly. However, for instances where there are no previous relevant information and so a desire to have the posteriors driven entirely by the new data, the difficulty is in how to define priors that are both vague enough (not bias results in their direction) and yet not so vague as to allow the MCMC sampler to drift off into unsupported regions (and thus get stuck and yield spurious estimates). For early implementations of MCMC sampling routines (such as Metropolis Hasting and Gibbs), it was fairly common to see very vague priors being defined. For example, the priors on effects, were typically normal priors with mean of 0 and variance of `1e+06` (1,00,000). These are very vague priors. Yet for some samplers (e.g. NUTS), such vague priors can encourage poor behaviour of the sampler - particularly if the posterior is complex. It is now generally advised that priors should (where possible) be somewhat **weakly informative** and to some extent, represent the bounds of what are feasible and sensible estimates. The degree to which priors __influence__ an outcome (whether by having a pulling effect on the estimates or by encouraging the sampler to drift off into unsupported regions of the posterior) is dependent on: - the relative sparsity of the data - the larger the data, the less weight the priors have and thus less influence they exert. - the complexity of the model (and thus posterior) - the more parameters, the more sensitive the sampler is to the priors. The sampled posterior is the product of both the likelihood and the prior - all of which are multidimensional. For most applications, it would be vertically impossible to define a sensible multidimensional prior. Hence, our only option is to define priors on individual parameters (e.g. the intercept, slope(s), variance etc) and to hope that if they are individually sensible, they will remain collectively sensible. So having (hopefully) impressed upon the notion that priors are an important consideration, I will now attempt to synthesise some of the approaches that can be employed to arrive at weakly informative priors that have been gleaned from various sources. Largely, this advice has come from the following resources: - https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations - http://svmiller.com/blog/2021/02/thinking-about-your-priors-bayesian-analysis/ I will outline some of the current main recommendations before summarising some approaches in a table. - weakly informative priors should contain enough information so as to regularise (discourage unreasonable parameter estimates whilst allowing all reasonable estimates). - for effects parameters on scaled (standardised) data, an argument could be made for a normal distribution with a standard deviation of 1 (e.g. `normal(0,1)`), although some prefer a _t_ distribution with 3 degrees of freedom and standard deviation of 1 (e.g. `student_t(3,0,1)`) - apparently a flatter _t_ is a more robust prior than a normal as an uninformative prior... - for un-scaled data, the above priors can be scaled by using the standard deviation of the data as the prior standard deviation (e.g. `student_t(3,0,sd(y))`, or `sudent_t(3,0,sd(y)/sd(x))`) - for priors of hierachical standard deviations, priors should encourage shrinkage towards 0 (particularly if the number of groups is small, since otherwise, the sampler will tend to be more responsive to "noise"). In this tutorial series, we will perform Bayesian analysis in the STAN language via an R interface. Two popular interfaces that greatly simplify the specification of Bayesian models are [brms](https://paul-buerkner.github.io/brms/) and [rstanarm](https://mc-stan.org/rstanarm/). We will exclusively focus on the former as it is far more flexible. | Family | Parameter | brms | rstanarm | |-------------------|--------------------------------------|---------------------------------|-----------------------------| | Gaussian | Intercept | `student_t(3,median(y),mad(y))` | `normal(mean(y),2.5*sd(y))` | | | 'Population effects' (slopes, betas) | flat, improper priors | `normal(0,2.5*sd(y)/sd(x))` | | | Sigma | `student_t(3,0,mad(y))` | `exponential(1/sd(y))` | | | 'Group-level effects' | `student_t(3,0,mad(y))` | `decov(1,1,1,1)` | | | Correlation on group-level effects | `ljk_corr_cholesky(1)` | | | Poisson | Intercept | `student_t(3,median(y),mad(y))` | `normal(mean(y),2.5*sd(y))` | | | 'Population effects' (slopes, betas) | flat, improper priors | `normal(0,2.5*sd(y)/sd(x))` | | | 'Group-level effects' | `student_t(3,0,mad(y))` | `decov(1,1,1,1)` | | | Correlation on group-level effects | `ljk_corr_cholesky(1)` | | | Negative binomial | Intercept | `student_t(3,median(y),mad(y))` | `normal(mean(y),2.5*sd(y))` | | | 'Population effects' (slopes, betas) | flat, improper priors | `normal(0,2.5*sd(y)/sd(x))` | | | Shape | `gamma(0.01, 0.01)` | `exponential(1/sd(y))` | | | 'Group-level effects' | `student_t(3,0,mad(y))` | `decov(1,1,1,1)` | | | Correlation on group-level effects | `ljk_corr_cholesky(1)` | | : {.primary .bordered .sm .paramsTable} Notes: `brms` https://github.com/paul-buerkner/brms/blob/c2b24475d727c8afd8bfc95947c18793b8ce2892/R/priors.R 1. In the above, for non-Gaussian families, `y` is first transformed according to the family link. If the family link is `log`, then 0.1 is first added to 0 values. 2. in `brms` the minimum standard deviation for the Intercept prior is `2.5` 3. in `brms` the minimum standard deviation for group-level priors is `10`. `rstanarm` http://mc-stan.org/rstanarm/articles/priors.html 1. in `rstanarm` priors on standard deviation and correlation associated with group-level effects are packaged up into a single prior (`decov` which is a decomposition of the variance and covariance matrix). In my experience, I find that the above priors tend to be a little bit too wide for many ecological applications and I often prefer to use 1.5 rather than 2.5 as the multiplier. :::: {.callout-note} In Bayesian models, centering of predictors offers huge numerical advantages. So important is it to center that `brms` automatically centers any continuous predictors for you. However, since the user has not necessarily centered the predictors, the user might misinterpret the outputs from a `brms` model. Consequently, when fitting a model, `brms` also generates y-intercept values that are consistent with un-centered values and these are the estimates returned to the user. Nevertheless, I would recommend that you should always explicitly center continuous predictors to provide more meaningful interpretations of the y-intercept. I would also highly recommend standardising continuous predictors - this will not only help speed up and stabilise the model, it will simplify the spefication of priors - see the specific examples later in this tutorial. :::: Based on the above, for our fabricated data, lets assign the following priors: - $\beta_0$: Normal prior centred at `r round(median(dat$y),2)` with a variance of `r round(mad(dat$y),2)` - mean: ```{r} #| prior_y_mean_1a #| echo: true #| eval: true #| cache: false dat$y |> median() |> round(2) ``` - variance: ```{r} #| prior_y_var_1a #| echo: true #| eval: true #| cache: false dat$y |> mad() |> round(2) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of `r round(mad(dat$y)/mad(dat$x),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_1a #| echo: true #| eval: true #| cache: false mad(dat$y) / mad(dat$x) |> round(2) ``` - $\sigma$: (half) _t_ distribution (3 degrees of freedom) centred at 0 with a variance of `r round(mad(dat$y),2)` - variance: ```{r} #| prior_y_var_1a #| echo: true #| eval: true #| cache: false ``` Note, again, when fitting models through either `rstanarm` or `brms`, the priors assume that the predictor(s) have been centred and are to be applied on the link scale. In this case the link scale is an identity. Similar logic can be applied for models that employ different distributions. In the following sections, we will define numerous sets of data (each of which represents different major forms of ecological data) and see how we can set appropriate priors in each case. In working through these example, it is worth reflecting on how much simpler prior specification is if we use standardised predictors. # Example data This tutorial will blend theoretical discussions with actual calculations and model fits. I believe that by bridging the divide between theory and application, we all gain better understanding. The applied components of this tutorial will be motivated by numerous fabricated data sets. The advantage of simulated data over real data is that with simulated data, we know the 'truth' and can therefore gauge the accuracy of estimates. The motivating examples are: - **Example 1** - simulated samples drawn from a Gaussian (normal) distribution reminiscent of data collected on measurements (such as body mass) - **Example 2** - simulated Gaussian samples drawn three different populations representing three different treatment levels (e.g. body masses of three different species) - **Example 3** - simulated samples drawn from a Poisson distribution reminiscent of count data (such as number of individuals of a species within quadrats) - **Example 4** - simulated samples drawn from a Negative Binomial distribution reminiscent of over-dispersed count data (such as number of individuals of a species that tends to aggregate in groups) - **Example 5** - simulated samples drawn from a Bernoulli (binomial with $n = 1$) distribution reminiscent of binary data (such as the presence/absence of a species within sites) - **Example 6** - simulated samples drawn from a Binomial distribution reminiscent of proportional data (such as counts of a particular taxa out of a total number of individuals) ::: {.panel-tabset} ## Example 1 (Gaussian data) Lets formally simulate the data illustrated above. The underlying process dictates that on average a one unit change in the predictor (`x`) will be associated with a five unit change in response (`y`) and when the predictor has a value of 0, the response will typically be 2. Hence, the response (`y`) will be related to the predictor (`x`) via the following: $$ y = 2 + 5x $$ This is a deterministic model, it has no uncertainty. In order to simulate actual data, we need to add some random noise. We will assume that the residuals are drawn from a Gaussian distribution with a mean of zero and standard deviation of 4. The predictor will comprise of 10 uniformly distributed integer values between 1 and 10. We will round the response to two decimal places. For repeatability, a seed will be employed on the random number generator. Note, the smaller the dataset, the less it is likely to represent the underlying deterministic equation, so we should keep this in mind when we look at how closely our estimated parameters approximate the 'true' values. Hence, the seed has been chosen to yield data that maintain a general trend that is consistent with the defining parameters. ```{r} #| label: sim #| eval: true #| echo: true #| fig-height: 3 #| fig-width: 3 #| warning: false set.seed(234) dat <- data.frame(x = 1:10) |> mutate(y = round(2 + 5*x + rnorm(n = 10, mean = 0, sd = 4), digits = 2)) dat ggplot(data = dat) + geom_point(aes(y = y, x = x)) ``` We will use these data in two ways. Firstly, to estimate the mean and variance of the reponse (`y`) ignoring the predcitor (`x`) and secondly to estimate the relationship between the reponse and predictor. For the former, we know that the mean and variance of the response (`y`) can be calculated as: $$ \begin{align} \bar{y} =& \frac{1}{n}\sum^n_{i=1}y_i\\ var(y) =& \frac{1}{n}\sum^n_{i=1}(y-\bar{y})^2\\ sd(y) =& \sqrt{var(y)} \end{align} $$ ```{r} #| label: stats1 #| eval: true #| echo: true #| fig-height: 3 #| fig-width: 3 #| warning: false mean(dat$y) var(dat$y) sd(dat$y) ``` ## Example 2 (categorical predictor) As previously described, categorical predictors are transformed into dummy codes prior to the fitting of the linear model. We will simulate a small data set with a single categorical predictor comprising a control and two treatment levels ('mediam', 'high'). To simplify things we will assume a Gaussian distribution, however most of the modelling steps would be the same regardless of the chosen distribution. The data will be drawn from three Gaussian distributions with a standard deviation of 4 and means of 20, 15 and 10. We will draw a total of 12 observations, four from each of the three populations. ```{r} #| label: sim2 #| eval: true #| echo: true #| fig-height: 3 #| fig-width: 3 #| warning: false set.seed(123) beta_0 <- 20 beta <- c(-2, -10) sigma <- 4 n <- 12 x <- gl(3, 4, 12, labels = c('control', 'medium', 'high')) y <- (model.matrix(~x) %*% c(beta_0, beta)) + rnorm(12, 0, sigma) dat2 <- data.frame(x = x, y = y) dat2 ggplot(data = dat2) + geom_point(aes(y = y, x = x)) ``` ## Example 3 (Poisson data) The Poisson distribution is only parameterized by a single parameter ($\lambda$) which represents both the mean and variance. Furthermore, Poisson data can only be positive integers. Unlike simple trend between two Gaussian or uniform distributions, modelling against a Poisson distribution alters the scale to logarithms. This needs to be taken into account when we simulate the data. The parameters that we used to simulate the underlying processes need to either be on a logarithmic scale, or else converted to a logarithmic scale prior to using them for generating the random data. Moreover, for any model that involves a non-identity link function (such as a logarithmic link function for Poisson models), 'slope' is only constant on the scale of the link function. When it is back transformed onto the natural scale (scale of the data), it takes on a different meaning and interpretation. We will chose $\beta_0$ to represent a value of 1 when `x=0`. As for the 'effect' of the predictor on the response, lets say that for every one unit increase in the predictor the response increases by 40% (on the natural scale). Hence, on the log scale, the slope will be $log(1.5)=$ `r log(1.4)`. ```{r} #| label: sim3 #| eval: true #| echo: true #| fig-height: 3 #| fig-width: 3 #| warning: false set.seed(123) beta <- c(1, 1.40) beta <- log(beta) n <- 10 dat3 <- data.frame(x=seq(from = 1, to = 10, len = n)) |> mutate(y = rpois(n, lambda = exp(beta[1] + beta[2]*x))) dat3 ggplot(data = dat3) + geom_point(aes(y = y, x = x)) ``` ## Example 4 (NB data) In theory, count data should follow a Poisson distribution and therefore have properties like mean equal to variance (e.g. $\textnormal{Dispersion}=\frac{\sigma}{\mu}=1$). However as simple linear models are low dimensional representations of a system, it is often unlikely that such a simple model can capture all the variability in the response (counts). For example, if we were modelling the abundance of a species of intertidal snail within quadrats in relation to water depth, it is highly likely that water depth alone drives snail abundance. There are countless other influences that the model has not accounted for. As a result, the observed data might be more variable than a Poisson (of a particular mean) would expect and in such cases, the model is over-dispersed (more variance than expected). Over dispersed models under-estimate the variability and thus precision in estimates resulting in inflated confidence in outcomes (elevated Type I errors). There are numerous causes of over-dispersed count data (one of which is eluded to above). These are: - additional sources of variability not being accounted for in the model (see above) - when the items being counted aggregate together. Although the underlying items may have been generated by a Poisson process, the items clump together. When the items are counted, they are more likely to be in either in relatively low or relatively high numbers - hence the data are more varied than would be expected from their overall mean. - imperfect detection resulting in excessive zeros. Again the underlying items may have been generated by a Poisson process, however detecting and counting the items might not be completely straight forward (particularly for more cryptic items). Hence, the researcher may have recorded no individuals in a quadrat and yet there was one or more present, they were just not obvious and were not detected. That is, layered over the Poisson process is another process that determines the detectability. So while the Poisson might expect a certain proportion of zeros, the observed data might have a substantially higher proportion of zeros - and thus higher variance. This example will generate data that is drawn from a negative binomial distribution so as to broadly represent any one of the above causes. We will chose $\beta_0$ to represent a value of 1 when `x=0`. As for the 'effect' of the predictor on the response, lets say that for every one unit increase in the predictor the response increases by 40% (on the natural scale). Hence, on the log scale, the slope will be $log(1.5)=$ `r log(1.4)`. Finally, the dispersion parameter (ratio of variance to mean) will be 10. ```{r} #| label: sim4 #| eval: true #| echo: true #| fig-height: 3 #| fig-width: 3 #| warning: false set.seed(234) beta <- c(1, 1.40) beta <- log(beta) n <- 10 size <- 10 dat4 <- data.frame(x = seq(from = 1, to = 10, len = n)) |> mutate( mu = exp(beta[1] + beta[2] * x), y = rnbinom(n, size = size, mu = mu) ) dat4 ggplot(data = dat4) + geom_point(aes(y = y, x = x)) ``` ## Example 5 (Binary data) Binary data (presence/absence, dead/alive, yes/no, heads/tails, etc) pose unique challenges for linear modeling. Linear regression, designed for continuous outcomes, may not be directly applicable to binary responses. The nature of binary data violates assumptions of normality and homoscedasticity, which are fundamental to linear regression. Furthermore, linear models may predict probabilities outside the [0, 1] range, leading to unrealistic predictions. This example will generate data that is drawn from a bernoulli distribution so as to broadly represent presence/absence data. We will chose $\beta_0$ to represent the odds of a value of 1 when $x=0$ equal to $0.02$. This is equivalent to a probability of $y$ being zero when $x=0$ of $\frac{0.02}{1+0.02}=0.0196$. E.g., at low $x$, the response is likely to be close to 0. For every one unit increase in $x$, we will stipulate a 2 times increase in odds that the expected response is equal to 1. ```{r} #| label: sim5 #| eval: true #| echo: true #| fig-height: 3 #| fig-width: 3 #| warning: false set.seed(234) beta <- c(0.02, 2) beta <- log(beta) n <- 10 dat5 <- data.frame(x = seq(from = 1, to = 10, len = n)) |> mutate( y = as.numeric(rbernoulli(n, p = plogis(beta[1] + beta[2] * x))) ) dat5 ggplot(data = dat5) + geom_point(aes(y = y, x = x)) ``` ## Example 6 (Binomial data) Similar to binary data, proportional (binomial) data tend to violate normality and homogeneity of variance (particularly as mean proportions approach either 0% or 100%. This example will generate data that is drawn from a binomial distribution so as to broadly represent proportion data. We will chose $\beta_0$ to represent the odds of a particular trial (e.g. an individual) being of a particular type (e.g. species 1) when $x=0$ and to equal to $0.02$. This is equivalent to a probability of $y$ being of the focal type when $x=0$ of $\frac{0.02}{1+0.02}=0.0196$. E.g., at low $x$, the the probability that an individual is taxa 1 is likely to be close to 0. For every one unit increase in $x$, we will stipulate a 2.5 times increase in odds that the expected response is equal to 1. For this example, we will also convert the counts into proportions ($y2$) by division with the number of trials ($5$). ```{r} #| label: sim6 #| eval: true #| echo: true #| fig-height: 3 #| fig-width: 3 #| warning: false set.seed(123) beta <- c(0.02, 2.5) beta <- log(beta) n <- 10 trials <- 5 dat6 <- data.frame(x = seq(from = 1, to = 10, len = n)) |> mutate( count = as.numeric(rbinom(n, size = trials, prob = plogis(beta[1] + beta[2] * x))), total = trials, y = count/total ) dat6 ggplot(data = dat6) + geom_point(aes(y = y, x = x)) ``` ::: # Exploratory data analysis Statistical models utilize data and the inherent statistical properties of distributions to discern patterns, relationships, and trends, enabling the extraction of meaningful insights, predictions, or inferences about the phenomena under investigation. To do so, statistical models make assumptions about the likely distributions from which the data were collected. Consequently, the reliability and validity of any statistical model depend upon adherence to these underlying assumptions. Exploratory Data Analysis (EDA) and assumption checking therefore play pivotal roles in the process of statistical analysis, offering essential tools to glean insights, assess the reliability of statistical methods, and ensure the validity of conclusions drawn from data. EDA involves visually and statistically examining datasets to understand their underlying patterns, distributions, and potential outliers. These initial steps provides an intuitive understanding of the data's structure and guides subsequent analyses. By scrutinizing assumptions, such as normality, homoscedasticity, and independence, researchers can identify potential limitations or violations that may impact the accuracy and reliability of their findings. Exploratory Data Analysis within the context of ecological statistical models usually comprise a set of targetted graphical summaries. They are not to be considered definitive diagnostics of the model assumptions, but rather a first pass to assess the obvious violations prior to the fitting of models. More definitive diagnostics can only be achieved after a model has been fit. In addition to graphical summaries, there are numerous statistical tests to help explore possible violations of various statistical assumptions. These tests are less commonly used in ecology since they are often more sensitive to deviations from ideal than are the models that we are seeking to ensure. Simple classic regression models are often the easiest models to fit and interpret and as such often represent a standard by which other alternate models are gauged. As you will see later in this tutorial, such models can actually be fit using closed form (exact solution) matrix algebra that can be performed by hand. Nevertheless, and perhaps as a result, they also impose some of the strictest assumptions. Although these collective assumptions are specific to gaussian models, they do provide a good introduction to model assumptions in general, so we will use them to motivate the more wider discussion. Simple (gaussian) linear models (represented below) make the following assumptions: ::: {.columns} :::: {.column width="65%"} ```{tikz} #| label: model1 #| eval: true #| echo: false #| cache: false #| fig-width: 6 \usetikzlibrary{shapes,arrows,shadows,positioning,mindmap,backgrounds,decorations, calc,fit, decorations.pathreplacing,decorations.pathmorphing, shadings,shapes.geometric,patterns} \newcommand{\tikzmark}[1]{\tikz[baseline, remember picture] \coordinate (#1) {};} \begin{tikzpicture}[remember picture] \node (model) {$y_i=\underbrace{\beta_0 + \beta_1\times x_i}_\text{Linearity} + \varepsilon_i $}; \node [anchor = west, right = 2cm of model.east] (dist) {$\varepsilon_i\sim \underbrace{\mathcal{N}(0, \tikzmark{sigma}\sigma^2)}_\text{Normality}$}; \node [anchor = west, below = 1.25cm of model.east] (covnode) {$ \tikzmark{cov}\mathbf{V}=cov = \begin{pmatrix} \tikzmark{covsigma}\sigma^2&0~&\cdots&\tikzmark{otherzero}0\\ 0&\sigma^2&\cdots&\vdots\\ \vdots&\cdots&\sigma^2&\vdots\\ \tikzmark{zero}0&\cdots&\cdots&\tikzmark{othercovsigma}\sigma^2\\ \end{pmatrix} $}; \coordinate (sigmamark) at ($(othercovsigma.north) + (covnode.west|-covnode)$); \path (sigmamark) +(1,0em) node[anchor = west](c) {Homogeneity of variance}; \draw[->] (sigmamark)++(.9em,0ex) to [bend right] (c.west) ; \coordinate (zeromark) at ($(otherzero.north) + (covnode.west|-covnode)$); \path (zeromark) +(1,-1em) node[anchor = west](Z) {\parbox{3cm}{Zero covariance\\ (=independence)}}; \draw[->] (zeromark)++(.9em,0ex) to [bend left] ($(Z.north west) + (0, -0.5em)$) ; \coordinate (sigma2mark) at ($(sigma.north) + (dist.west|-dist)$); \coordinate (covmark) at ($(cov.north) + (covnode.west|-covnode)$); \draw[->,thick, black!50!white!50] ($(sigma2mark.east) +(1em, 0.5em)$) .. controls ++(1cm,0.2cm) and ++(1.7cm,0.8em) .. ++(1cm,-1cm) .. controls ++(-2cm,-0.1cm) and ++(4cm,-.0cm) .. ++(-6cm,0cm) .. controls ++(-0.8cm, 0cm) and ++ (0.2cm, 1.3cm) .. ($(covmark.north) +(0.7em, 0.7em)$); \end{tikzpicture} ``` :::: :::: {.column width="35%"} ```{r} #| label: assumptions, #| eval: true #| echo: false #| fig-height: 3 #| fig-width: 3 #| warning: false set.seed(234) x=1:10 y=2+5*x y1=2+5*x + rnorm(10,0,4) lm1 <- lm(y1 ~ x) cc <- coef(lm1) r <- resid(lm1) #print(r) ff <- fitted(lm1) #print(ff) boxx <- data.frame(x = x, middle = ff, lower = ff - 2.7, upper = ff + 2.7, ymin = ff -11, ymax = ff + 11) g2=ggplot(data=NULL, aes(y=y1, x=x)) + geom_blank(aes(x = factor(x))) + ##geom_segment(data = NULL, aes(y = ff, x = x, yend = y1, xend = x), colour = "red") + geom_boxplot(data = boxx, aes(group = x, x = x, ymin = ymin, ymax = ymax, lower = lower, upper = upper, middle = middle), stat = "identity", inherit.aes = FALSE, color = "grey") + geom_point() + geom_abline(slope = cc[2], intercept = cc[1], colour = "blue") + scale_y_continuous('y') + theme_classic() #+ #theme(axis.text = element_blank(), axis.ticks = element_blank()) g2 ``` :::: ::: ::: {.callout-note collapse="true"} ### Notes on the data depicted above The data depicted above where generated using the following R codes: ```{r} #| label: assumptions1, #| eval: true #| echo: true #| warning: false set.seed(234) x <- 1:10 y <- 2 + (5*x) + rnorm(10,0,4) ``` The observations represent - single observations drawn from 10 normal populations - each population had a standard deviation of 4 - the mean of each population varied linearly according to the value of x ($2 + 5x$) ::: - **normality**: the residuals (and thus observations) must be drawn from populations that are normal distribution. _The right hand figure underlays the ficticious normally distributed populations from which the observed values have been sampled_. :::: {.indented} ::: {.callout-note collapse="true"} ### More information about assessing normality Estimation and inference testing in linear regression assumes that the response is normally distributed in each of the populations. In this case, the populations are all possible measurements that could be collected at each level of $x$ - hence there are 16 populations. Typically however, we only collect a single observation from each population (as is also the case here). How then can be evaluate whether each of these populations are likely to have been normal? ::::: {.columns} :::::: {.column width="45%"} ```{r} #| label: assumptions1a #| eval: true #| echo: false #| warning: false #| fig-width: 4 #| fig-height: 3 set.seed(1) n <- 16 rep<-200 a <- 40 #intercept b <- seq(0,15,l=16) #slope sigma2 <- 4 #residual variance (sd=2) x <- gl(n,rep,n*rep,lab=0:16) #values of the year covariate eps <- rnorm(n*rep, mean = 0, sd = sqrt(sigma2)) #residuals y <- (model.matrix(~x) %*% c(a,b))+eps nc <- matrix(c(1,1,1,2,1,1,1,2),nrow=2,ncol=4,byrow=T) lay <- layout(nc) par(mar=c(3,3,0,0)) plot(y~x, las=1, axes=FALSE, ann=FALSE) abline(lm(y~as.numeric(x)), col="grey40") box(bty="l") par(mar=c(3,3,0,0)) boxplot(y, axes=F, ann=FALSE) ``` :::::: :::::: {.column width="45%"} ```{r} #| label: assumptions1b #| eval: true #| echo: false #| warning: false #| fig-width: 4 #| fig-height: 3 set.seed(1) n <- 16 rep<-200 a <- 40 #intercept b <- seq(0,15,l=16) #slope sigma2 <- 3 #residual variance (sd=2) x <- gl(n,rep,n*rep,lab=0:16) #values of the year covariate eps <- (rnorm(n*rep, mean = 0, sd = sqrt(sigma2)))^{-1/3} #residuals y <- (model.matrix(~x) %*% c(a,b))+eps nc <- matrix(c(1,1,1,2,1,1,1,2),nrow=2,ncol=4,byrow=T) lay <- layout(nc) par(mar=c(3,3,0,0)) plot(y~x, las=1, axes=FALSE, ann=FALSE) abline(lm(y~as.numeric(x)), col="grey40") box(bty="l") par(mar=c(3,3,0,0)) boxplot(y, axes=F, ann=FALSE) ``` :::::: ::::: For a given response, the population distributions should follow much the same distribution shapes. Therefore provided the single samples from each population are unbiased representations of those populations, a boxplot of all observations should reflect the population distributions. The two figures above show the relationships between the individual population distributions and the overall distribution. The left hand figure shows a distribution drawn from single representatives of each of the 16 populations. Since the 16 individual populations were normally distributed, the distribution of the 16 observations is also normal. By contrast, the right hand figure shows 16 log-normally distributed populations and the resulting distribution of 16 single observations drawn from these populations. The overall boxplot mirrors each of the individual population distributions. Whilst traditionally, non-normal data would typically be **normalised** via a scale transformation (such as a logarithmic transformation), these days it is arguably more appropriate to attempt to match the data to a more suitable distribution (see later in this tutorial). You may have noticed that we have only explored the distribution of the response (y-axis). What about the distribution of the predictor (independent, x-axis) variable, does it matter? The distribution assumption applies to the residuals (which as purely in the direction of the y-axis). Indeed technically, it is assumed that there is no uncertainty associated with the predictor variable. They are assumed to be set and thus there is no error associated with the values observed. Whilst this might not always be reasonable, it is an assumption. Given that the predictor values are expected to be _set_ rather than _measured_, we actually assume that they are **uniformly** distributed. In practice, the exact distribution of predictor values is not that important provided it is reasonably symmetrical and no outliers (unusually small or large values) are created as a result of the distribution. As with exploring the distribution of the response variable, boxplots, histograms and density plots can be useful means of exploring the distribution of predictor variable(s). When such diagnostics reveal distributional issues, scale transformations (such as logarithmic transformations) are appropriate. ::: :::: - **homogeneity of variance**: the residuals (and thus observations) must be drawn from populations that are equally varied. The model as shown only estimates a single variance ($\sigma^2$) parameter - it is assumed that this is a good overall representation of all underlying populations. _The right hand figure underlays the ficticious normally distributed and equally varied populations from which the observations have been sampled_. Moreover, since the expected values (obtained by solving the deterministic component of the model) and the variance must be estimated from the same data, they need to be independent (not related one another) :::: {.indented} ::: {.callout-note collapse="true"} ### More information about assessing homogeneity of variance Simple linear regression also assumes that each of the populations are equally varied. Actually, it is the prospect of a relationship between the mean and variance of y-values across x-values that is of the greatest concern. Strictly the assumption is that the distribution of y values at each x value are equally varied and that there is no relationship between mean and variance. However, as we only have a single y-value for each x-value, it is difficult to directly determine whether the assumption of homogeneity of variance is likely to have been violated (mean of one value is meaningless and variability can't be assessed from a single value). The figure below depicts the ideal (and almost never realistic) situation in which (left hand figure) the populations are all equally varied. The middle figure simulates drawing a single observation from each of the populations. When the populations are equally varied, the spread of observed values around the trend line is fairly even - that is, there is no trend in the spread of values along the line. If we then plot the residuals (difference between observed values and those predicted by the trendline) against the predict values, there is a definite lack of pattern. This lack of pattern is indicative of a lack of issues with homogeneity of variance. ::::: {.columns} :::::: {.column width="32%"} ```{r} #| label: assumptions2a #| eval: true #| echo: false #| warning: false #| fig-width: 2.5 #| fig-height: 2 set.seed(1) n <- 16 rep<-200 a <- 40 #intercept b <- seq(0,15,l=16) #slope sigma2 <- 4 #residual variance (sd=2) x <- gl(n,rep,n*rep,lab=0:16) #values of the year covariate eps <- rnorm(n*rep, mean = 0, sd = sqrt(sigma2)) #residuals y <- (model.matrix(~x) %*% c(a,b))+eps par(mar=c(3,3,0,0)) plot(y~x, las=1, axes=FALSE, ann=FALSE) abline(lm(y~as.numeric(x)), col="grey40") mtext("X",1,line=1, cex=1.5) mtext("Y",2,line=1, cex=1.5, las=2) box(bty="l") ``` :::::: :::::: {.column width="32%"} ```{r} #| label: assumptions2b #| eval: true #| echo: false #| warning: false #| fig-width: 2.5 #| fig-height: 2 set.seed(1) n <- 16 rep<-1 a <- 40 #intercept b <- 15 #slope sigma2 <- 1000 #residual variance (sd=5) x<-1:16 eps <- rnorm(n*rep, mean = 0, sd = sqrt(sigma2)) #residuals y <- (model.matrix(~x) %*% c(a,b))+eps par(mar=c(3,3,0,0)) plot(y~x, las=1, axes=FALSE, ann=FALSE) abline(lm(y~as.numeric(x)), col="grey40") segments(x,y,x,fitted(lm(y~x))) mtext("x",1,line=1, cex=1.5) mtext("y",2,line=1, cex=1.5, las=2) box(bty="l") ``` :::::: :::::: {.column width="32%"} ```{r} #| label: assumptions2c #| eval: true #| echo: false #| warning: false #| fig-width: 2.5 #| fig-height: 2 res <- y-fitted(lm(y~x)) exp <- fitted(lm(y~x)) par(mar=c(3,3,0,0)) plot(exp,res, xlab="Predicted x", type="p", pch=16, axes=F) mtext("Predicted x",1,line=1, cex=1.5) mtext("Residuals",2,line=1, cex=1.5) abline(h=0) box(bty="l") rm("exp") ``` :::::: ::::: If we now contrast the above to a situation where the population variance is related to the mean (unequal variance), we see that the observations drawn from these populations are not evenly distributed along the trendline (they get more spread out as the mean predicted value increase). This pattern is emphasized in the residual plot which displays a characteristic "wedge"-shape pattern. ::::: {.columns} :::::: {.column width="32%"} ```{r} #| label: assumptions2d #| eval: true #| echo: false #| warning: false #| fig-width: 2.5 #| fig-height: 2 set.seed(1) n <- 16 rep<-200 a <- 40 #intercept b <- seq(0,15,l=16) #slope sigma2 <- rep(seq(1,200,l=16),each=rep) #residual variance (sd=2) x <- gl(n,rep,n*rep,lab=0:16) #values of the year covariate eps <- rnorm(n*rep, mean = 0, sd = sqrt(sigma2)) #residuals y <- (model.matrix(~x) %*% c(a,b))+eps par(mar=c(3,3,0,0)) plot(y~x, las=1, axes=FALSE, ann=FALSE) abline(lm(y~as.numeric(x)), col="grey40") mtext("X",1,line=1, cex=1.5) mtext("Y",2,line=1, cex=1.5, las=2) box(bty="l") ``` :::::: :::::: {.column width="32%"} ```{r} #| label: assumptions2e #| eval: true #| echo: false #| warning: false #| fig-width: 2.5 #| fig-height: 2 set.seed(7) n <- 16 rep<-1 a <- 40 #intercept b <- 15 #slope sigma2 <- seq(1,2000,l=16) #residual variance (sd=5) x<-1:16 eps <- x*10#+rnorm(n,0,(1:16)*7) eps[(1:16 %%2)==1] <- -1*eps[(1:16 %%2)==1] #rnorm(n*rep, mean = 0, sd = sqrt(sigma2)) #residuals y <- (model.matrix(~x) %*% c(a,b))+eps par(mar=c(3,3,0,0)) plot(y~x, las=1, axes=FALSE, ann=FALSE) abline(lm(y~as.numeric(x)), col="grey40") segments(x,y,x,fitted(lm(y~x))) mtext("x",1,line=1, cex=1.5) mtext("y",2,line=1, cex=1.5, las=2) box(bty="l") ``` :::::: :::::: {.column width="32%"} ```{r} #| label: assumptions2f #| eval: true #| echo: false #| warning: false #| fig-width: 2.5 #| fig-height: 2 res <- y-fitted(lm(y~x)) exp <- fitted(lm(y~x)) par(mar=c(3,3,0,0)) plot(exp,res, xlab="Predicted x", type="p", pch=16, axes=F) mtext("Predicted x",1,line=1, cex=1.5) mtext("Residuals",2,line=1, cex=1.5) abline(h=0) box(bty="l") rm("exp") ``` :::::: ::::: Hence looking at the spread of values around a trendline on a scatterplot of $y$ against $x$ is a useful way of identifying gross violations of homogeneity of variance. Residual plots provide an even better diagnostic. The presence of a wedge shape is indicative that the population mean and variance are related. ::: :::: - **linearity**: the underlying relationships must be simple linear trends, since the line of best fit through the data (of which the slope is estimated) is linear. _The right hand figure depicts a linear trend through the underlying populations_. :::: {.indented} ::: {.callout-note collapse="true"} ### More information about assessing linearity It is important to disclose the meaning of the word "linear" in the term "linear regression". Technically, it refers to a _linear_ combination of regression coefficients. For example, the following are examples of linear models: - $y_i = \beta_0 + \beta_1 x_i$ - $y_i = \beta_0 + \beta_1 x_i + \beta_2 z_i$ - $y_i = \beta_0 + \beta_1 x_i + \beta_2 x^2_i$ All the coefficients ($\beta_0$, $\beta_1$, $\beta_2$) are linear terms. Note that the last of the above examples, is a linear model, however it describes a non-linear trend. Contrast the above models with the following non-linear model: - $y_i = \beta_0 + x_i^{\beta_1}$ In that case, the coefficients are not linear combinations (one of them is a power term). That said, a simple linear regression usually fits a straight (linear) line through the data. Therefore, prior to fitting such a model, it is necessary to establish whether this really is the most sensible way of describing the relationship. That is, does the relationship appear to be linearly related or could some other non-linear function describe the relationship better. Scatterplots and residual plots are useful diagnostics. To see how a residual plot could be useful, consider the following. The first row of figures illustrate the residuals resulting from data drawn from a linear trend. The residuals are effectively random noise. By contrast, the second row show the residuals resulting from data drawn from a non-normal relationship that have nevertheless been modelled as a linear trend. There is still a clear pattern remaining in the residuals. ::::: {.columns} :::::: {.column width="32%"} ```{r} #| label: assumptions2a #| eval: true #| echo: false #| warning: false #| fig-width: 2.5 #| fig-height: 2 ``` :::::: :::::: {.column width="32%"} ```{r} #| label: assumptions2b #| eval: true #| echo: false #| warning: false #| fig-width: 2.5 #| fig-height: 2 ``` :::::: :::::: {.column width="32%"} ```{r} #| label: assumptions2c #| eval: true #| echo: false #| warning: false #| fig-width: 2.5 #| fig-height: 2 ``` :::::: ::::: ::::: {.columns} :::::: {.column width="32%"} ```{r} #| label: assumptions2a2 #| eval: true #| echo: false #| warning: false #| fig-width: 2.5 #| fig-height: 2 set.seed(1) n <- 16 rep<-200 a <- 40 #intercept b <-c(50, 100) sigma2 <- 4 #residual variance (sd=2) x <- gl(n,rep,n*rep,lab=0:16) #values of the year covariate eps <- rnorm(n*rep, mean = 0, sd = sqrt(sigma2)) #residuals y <- (model.matrix(~poly(x, 2)) %*% c(a, b))+eps par(mar=c(3,3,0,0)) plot(y~x, las=1, axes=FALSE, ann=FALSE) abline(lm(y~as.numeric(x)), col="grey40") mtext("X",1,line=1, cex=1.5) mtext("Y",2,line=1, cex=1.5, las=2) box(bty="l") ``` :::::: :::::: {.column width="32%"} ```{r} #| label: assumptions2b2 #| eval: true #| echo: false #| warning: false #| fig-width: 2.5 #| fig-height: 2 set.seed(1) n <- 16 rep<-1 a <- 40 #intercept b <-c(50, 100) sigma2 <- 4 #residual variance (sd=2) x <- 1:16 #gl(n,rep,n*rep,lab=0:16) #values of the year covariate eps <- rnorm(n*rep, mean = 0, sd = sqrt(sigma2)) #residuals y <- (model.matrix(~poly(x, 2)) %*% c(a, b))+eps par(mar=c(3,3,0,0)) plot(y~x, las=1, axes=FALSE, ann=FALSE) abline(lm(y~as.numeric(x)), col="grey40") segments(x,y,x,fitted(lm(y~x))) mtext("x",1,line=1, cex=1.5) mtext("y",2,line=1, cex=1.5, las=2) box(bty="l") ``` :::::: :::::: {.column width="32%"} ```{r} #| label: assumptions2c2 #| eval: true #| echo: false #| warning: false #| fig-width: 2.5 #| fig-height: 2 res <- y-fitted(lm(y~x)) exp <- fitted(lm(y~x)) par(mar=c(3,3,0,0)) plot(exp,res, xlab="Predicted x", type="p", pch=16, axes=F) mtext("Predicted x",1,line=1, cex=1.5) mtext("Residuals",2,line=1, cex=1.5) abline(h=0) box(bty="l") rm("exp") ``` :::::: ::::: The above might be an obvious and somewhat overly contrived example, yet it does illustrate the point - that a pattern in the residuals could point to a mis-specified model. If non-linearity does exist (as in the second case above) , then fitting a straight line through what is obviously not a straight relationship is likely to poorly represent the true nature of the relationship. There are numerous causes of non-linearity: 1. underlying distributional issues can result in non-linearity. For example, if we are assuming a gaussian distribution and the data are non-normal, often the relationships will appear non-linear. Addressing the distributional issues can therefore resolve the linearity issues 2. the underlying relationship might truly be non-linear in which case this should be reflected in some way by the model formula. If the model formula fails to describe the non-linear trend, then problems will persist. 3. the model proposed is missing an important covariate that might help standardise the data in a way that results in linearity ::: :::: - **independence**: the residuals (and thus observations) must be independently drawn from the populations. That is, the correlation between all the observations is assumed to be 0 (off-diagonals in the covariance matrix). More practically, there should be no pattern to the correlations between observations. Random sampling and random treatment assignment are experimental design elements that are intended to mitigate many types of sampling biases that cause dependencies between observations. Nevertheless, there are aspects of sampling designs that are either logistically difficult to randomise or in some cases not logically possible. For example, the residuals from observations sampled closer together in space and time will likely be more similar to one another than those of observations more spaced apart. Since neither space nor time can be randomised, data collected from sampling designs that involve sampling over space and/or time need to be assess for spatial and temporal dependencies. These concepts will be explored in the context of introducing susceptible designs in a later tutorial. The above is only a very brief overview of the model assumptions that apply to just one specific model (simple linear gaussian regression). For the remainder of this section, we will graphically explore the two motivating example data sets so as gain insights into what distributional assumptions might be most valid, and thus help guide modelling choices. Similarly, for subsequent tutorials in this series (that introduce progressively more complex models), all associated assumptions will be explored and detailed. ::: {.panel-tabset} ## Example 1 (Gaussian data) :::: {.panel-tabset} ### Normality ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda1a #| fig-width: 3 #| fig-height: 3 dat |> ggplot(aes(y = y)) + geom_boxplot() ``` :::::: :::::: {.column width="48%"} **Conclusions** - there is no strong evidence of non-normality - to be convincing evidence of non-normality, each segment of the boxplot should get progressively larger :::::: ::::: ### Homogeneity of variance ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda1b #| fig-width: 3 #| fig-height: 3 dat |> ggplot(aes(y = y, x = x)) + geom_smooth(method = "lm") + geom_point() ``` :::::: :::::: {.column width="48%"} **Conclusions** - the spread of values around the trendline seems fairly even (hence it there is no evidence of non-homogeneity :::::: ::::: ### Linearity ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda1c #| fig-width: 3 #| fig-height: 3 dat |> ggplot(aes(y = y, x = x)) + geom_smooth(method = "lm") + geom_point() ``` :::::: :::::: {.column width="48%"} **Conclusions** - the data seems well represented by the linear trendline. Furthermore, the lowess smoother does not appear to have a consistent shift trajectory. :::::: ::::: :::: **Conclusions** - there are no obvious violations of the linear regression model assumptions - we can now fit the suggested model - full confirmation about the model's goodness of fit should be reserved until after exploring the additional diagnostics that are only available after fitting the model. ## Example 2 (categorical predictor) :::: {.panel-tabset} ### Normality ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda2a #| fig-width: 3 #| fig-height: 3 dat2 |> ggplot(aes(y = y, x = x)) + geom_boxplot() ``` :::::: :::::: {.column width="48%"} **Conclusions** - there is no consitent evidence of non-normality across all groups - even though the control group demonstrates some evidence of non-normality :::::: ::::: ### Homogeneity of variance ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda2b #| fig-width: 3 #| fig-height: 3 dat2 |> ggplot(aes(y = y, x = x)) + geom_boxplot() ``` :::::: :::::: {.column width="48%"} **Conclusions** - the spread of noise in each group seems reasonably similar - more importantly, there does not seem to be a relationship between the mean (as approximated by the position of the boxplots along the y-axis) and the variance (as approximated by the spread of the boxplots). - that is, the size of the boxplots do not vary with the elevation of the boxplots. :::::: ::::: ### Linearity Linearity is not an issue for categorical predictors since it is effectively fitting separate lines between pairs of points (and a line between two points can only ever be linear).... :::: **Conclusions** - no evidence of non-normality - no evidence of non-homogeneity of variance ## Example 3 (Poisson data) :::: {.panel-tabset} ### Normality ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda3a #| fig-width: 3 #| fig-height: 3 dat3 |> ggplot(aes(y = y)) + geom_boxplot() ``` :::::: :::::: {.column width="48%"} **Conclusions** - there is strong evidence of non-normality - each segment of the boxplot should get progressively larger :::::: ::::: ### Homogeneity of variance ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda3b #| fig-width: 3 #| fig-height: 3 dat3 |> ggplot(aes(y = y, x = x)) + geom_smooth(method = "lm") + geom_point() ``` :::::: :::::: {.column width="48%"} **Conclusions** - the spread of noise does not look random along the line of best fit. - homogeneity of variance is difficult to assess in the presence of distributional issues (such as non-normality in this case) as they can result in non-linearity (apparent here) :::::: ::::: ### Linearity ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda3c #| fig-width: 3 #| fig-height: 3 dat3 |> ggplot(aes(y = y, x = x)) + geom_smooth(method = "lm") + geom_smooth(colour = "red") + geom_point() ``` :::::: :::::: {.column width="48%"} **Conclusions** - the data do not appear to be linear - the red line is a loess smoother and it is clear that the data are not linear :::::: ::::: :::: **Conclusions** - there are obvious violations of the linear regression model assumptions - we should consider a different model that does not assume normality ## Example 4 (NB data) :::: {.panel-tabset} ### Normality ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda4a #| fig-width: 3 #| fig-height: 3 dat4 |> ggplot(aes(y = y)) + geom_boxplot() ``` :::::: :::::: {.column width="48%"} **Conclusions** - there is strong evidence of non-normality - each segment of the boxplot should get progressively larger :::::: ::::: ### Homogeneity of variance ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda4b #| fig-width: 3 #| fig-height: 3 dat4 |> ggplot(aes(y = y, x = x)) + geom_smooth(method = "lm") + geom_point() ``` :::::: :::::: {.column width="48%"} **Conclusions** - the spread of noise does not look random along the line of best fit. - homogeneity of variance is difficult to assess in the presence of distributional issues (such as non-normality in this case) as they can result in non-linearity (apparent here) :::::: ::::: ### Linearity ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda4c #| fig-width: 3 #| fig-height: 3 dat4 |> ggplot(aes(y = y, x = x)) + geom_smooth(method = "lm") + geom_smooth(colour = "red") + geom_point() ``` :::::: :::::: {.column width="48%"} **Conclusions** - the data do not appear to be linear - the red line is a loess smoother and it is clear that the data are not linear :::::: ::::: :::: **Conclusions** - there are obvious violations of the linear regression model assumptions - we should consider a different model that does not assume normality ## Example 5 (Binary data) :::: {.panel-tabset} ### Normality ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda5a #| fig-width: 3 #| fig-height: 3 dat5 |> ggplot(aes(y = y)) + geom_boxplot() ``` :::::: :::::: {.column width="48%"} **Conclusions** - clearly a set of 0's and 1's cant be normally distributied. :::::: ::::: ### Homogeneity of variance ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda5b #| fig-width: 3 #| fig-height: 3 dat5 |> ggplot(aes(y = y, x = x)) + geom_smooth(method = "lm") + geom_point() ``` :::::: :::::: {.column width="48%"} **Conclusions** - the spread of noise does not look random (or equal) along the line of best fit. :::::: ::::: ### Linearity ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda5c #| fig-width: 3 #| fig-height: 3 dat5 |> ggplot(aes(y = y, x = x)) + geom_smooth(method = "lm") + geom_smooth(colour = "red") + geom_point() ``` :::::: :::::: {.column width="48%"} **Conclusions** - the data are clearly not linear - the red line is a loess smoother and it is clear that the data are not linear :::::: ::::: :::: **Conclusions** - there are obvious violations of the linear regression model assumptions - we should consider a different model that does not assume normality ## Example 6 (Binomial data) :::: {.panel-tabset} ### Normality ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda6a #| fig-width: 3 #| fig-height: 3 dat6 |> ggplot(aes(y = y)) + geom_boxplot() ``` :::::: :::::: {.column width="48%"} **Conclusions** - distribution is not normal and is truncated :::::: ::::: ### Homogeneity of variance ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda6b #| fig-width: 3 #| fig-height: 3 dat6 |> ggplot(aes(y = y, x = x)) + geom_smooth(method = "lm") + geom_point() ``` :::::: :::::: {.column width="48%"} **Conclusions** - the spread of noise does not look random (or equal) along the line of best fit. :::::: ::::: ### Linearity ::::: {.columns} :::::: {.column width="48%"} ```{r} #| label: eda6c #| fig-width: 3 #| fig-height: 3 dat6 |> ggplot(aes(y = y, x = x)) + geom_smooth(method = "lm") + geom_smooth(colour = "red") + geom_point() ``` :::::: :::::: {.column width="48%"} **Conclusions** - although there is no evidence of non-linearity from this small data set, it is worth noting that the line of best fit does extend outside the logical response range [0.1] within the range of observed $x$ values. That is, a simple linear model would predict proportions higher than 100% at high values of $x$ - this is a common issue with binomial data and is often addressed by fitting a logistic regression model :::::: ::::: :::: **Conclusions** - there are obvious violations of the linear regression model assumptions - we should consider a different model that does not assume normality ::: # Fitting models :::: {.panel-tabset} ### Example 1 (Gaussian data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Define priors One way to assess the priors is to have the MCMC sampler sample purely from the prior predictive distribution without conditioning on the observed data. Doing so provides a glimpse at the range of predictions possible under the priors. On the one hand, wide ranging predictions would ensure that the priors are unlikely to influence the actual predictions once they are conditioned on the data. On the other hand, if they are too wide, the sampler is being permitted to traverse into regions of parameter space that are not logically possible in the context of the actual underlying ecological context. Not only could this mean that illogical parameter estimates are possible, when the sampler is traversing regions of parameter space that are not supported by the actual data, the sampler can become unstable and have difficulty. In `brms`, we can inform the sampler to draw from the prior predictive distribution instead of conditioning on the response, by running the model with the `sample_prior = 'only'` argument. Unfortunately, this cannot be applied when there are flat priors (since the posteriors will necessarily extend to negative and positive infinity). Therefore, in order to use this useful routine, we need to make sure that we have defined a proper prior for all parameters. Earlier we suggested the following priors might be useful: - $\beta_0$: Normal prior centred at `r round(median(dat$y),2)` with a variance of `r round(mad(dat$y),2)` - mean: ```{r} #| prior_y_mean_1a #| echo: true #| eval: true #| cache: false ``` - variance: ```{r} #| prior_y_var_1a #| echo: true #| eval: true #| cache: false ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of `r round(mad(dat$y)/mad(dat$x),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_1a #| echo: true #| eval: true #| cache: false ``` - $\sigma$: (half) _t_ distribution (3 degrees of freedom) centred at 0 with a variance of `r round(mad(dat$y),2)` - variance: ```{r} #| prior_y_var_1a #| echo: true #| eval: true #| cache: false ``` It might be use usefull to understand what some of these distributions look like. For example, we have used both a normal (Gaussian) distribution and a flatter _t_ distribution for y-intercept and slope respectively. This was a somewhat arbitrary choice. We could easily have gone with either normal or _t_ distributions for all of the above parameters. To visualise prior distributions for the slope based on both normal and _t_ distributions: ```{r} #| label: normal_prior #| eval: true #| echo: true #| fig-width: 4 #| fig-height: 3 standist::visualize("normal(0, 4.09)", "student_t(3, 0, 4.09)", xlim = c(-20, 20)) ``` Evidently, the _t_ distribution (with 3 degrees of freedom) is wider than the normal distribution. The former should be more robust to data with values that are less concentrated around the mean. ```{r} #| label: set_priors_1a #| eval: true #| echo: true priors <- prior(normal(31.21, 15.17), class = "Intercept") + prior(student_t(3, 0, 4.09), class = "b") + prior(student_t(3, 0, 15.17), class = "sigma") ``` ##### Fit the model $$ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 x_i\\ \end{align} $$ 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: junk #| results: markup #| eval: true #| echo: true #| cache: false dat <- data.frame(y = rnorm(10), x = rnorm(10)) brm(y ~ x, data = dat, backend = "rstan") ``` ```{r} #| label: brms1a #| eval: true #| echo: true #| cache: true dat1a.brm <- brm(bf(y ~ x), data=dat, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_1a #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat1a.brm |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_1a #| eval: true #| echo: true #| cache: true dat1a.brm2 <- update(dat1a.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_1a #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat1a.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_1a #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat1a.brm2 |> tidybayes::get_variables() dat1a.brm2 |> hypothesis("x = 0") |> plot() dat1a.brm2 |> hypothesis("sigma = 0", class = "") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel2k, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat1a.brm2 |> tidybayes::get_variables() dat1a.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: #### Centered predictor ::::: {.panel-tabset} ##### Define priors Unless you explicitly direct `brm` to include a user-defined intercept, the priors on the default intercept should assume that the predictor(s) are centered (because `brm` will automatically center all continuous predictors). Lets try the following priors: - $\beta_0$: Normal prior centred at `r round(median(dat$y),2)` with a variance of `r round(mad(dat$y),2)` - mean: ```{r} #| prior_y_mean_1b #| echo: true #| eval: true #| cache: false dat$y |> median() |> round(2) ``` - variance: ```{r} #| prior_y_var_1b #| echo: true #| eval: true #| cache: false dat$y |> mad() |> round(2) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centered at 0 with a variance of `r round(mad(dat$y)/mad(dat$x),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_1b #| echo: true #| eval: true #| cache: false mad(dat$y) / mad(dat$x) |> round(2) ``` - $\sigma$: (half) _t_ distribution (3 degrees of freedom) centred at 0 with a variance of `r round(mad(dat$y),2)` - variance: ```{r} #| prior_y_var_1b #| echo: true #| eval: true #| cache: false ``` ```{r} #| label: set_priors_1b #| eval: true #| echo: true priors <- prior(normal(31.21, 15.17), class = "Intercept") + prior(student_t(3, 0, 4.09), class = "b") + prior(student_t(3, 0, 15.17), class = "sigma") ``` ##### Fit the model $$ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 (x_i - \bar{x})\\ \end{align} $$ 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms1b #| eval: true #| echo: true #| cache: true dat1b.brm <- brm(bf(y ~ scale(x, scale = FALSE)), data=dat, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_1b #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat1b.brm |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_1b #| eval: true #| echo: true #| cache: true dat1b.brm2 <- update(dat1b.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_1b #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat1b.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_1b #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat1b.brm2 |> tidybayes::get_variables() dat1b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot() dat1b.brm2 |> hypothesis("sigma = 0", class = "") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel1b, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat1b.brm2 |> tidybayes::get_variables() dat1b.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Define priors When the predictor is standardised, it simplifies prior definition because we no longer need to consider the scale of the predictor. Lets try the following priors: - $\beta_0$: Normal prior centred at `r round(median(dat$y),2)` with a variance of `r round(mad(dat$y),2)` - mean: ```{r} #| prior_y_mean_1c #| echo: true #| eval: true #| cache: false dat$y |> median() |> round(2) ``` - variance: ```{r} #| prior_y_var_1c #| echo: true #| eval: true #| cache: false dat$y |> mad() |> round(2) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centered at 0 with a variance of `r round(mad(dat$y),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_1c #| echo: true #| eval: true #| cache: false mad(dat$y) |> round(2) ``` - $\sigma$: (half) _t_ distribution (3 degrees of freedom) centred at 0 with a variance of `r round(mad(dat$y),2)` - variance: ```{r} #| prior_y_var_1c #| echo: true #| eval: true #| cache: false ``` ```{r} #| label: set_priors_1c #| eval: true #| echo: true priors <- prior(normal(31.21, 15.17), class = "Intercept") + prior(student_t(3, 0, 15.17), class = "b") + prior(student_t(3, 0, 15.17), class = "sigma") ``` ##### Fit the model $$ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \beta_1 (x_i - \bar{x})/\sigma_{x}\\ \end{align} $$ 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms1c #| eval: true #| echo: true #| cache: true dat1c.brm <- brm(bf(y ~ scale(x)), data=dat, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_1c #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat1c.brm |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_1c #| eval: true #| echo: true #| cache: true dat1c.brm2 <- update(dat1c.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_1c #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat1c.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_1c #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat1c.brm2 |> tidybayes::get_variables() dat1c.brm2 |> hypothesis("scalex = 0") |> plot() dat1c.brm2 |> hypothesis("sigma = 0", class = "") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel1c, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat1c.brm2 |> tidybayes::get_variables() dat1c.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: ::::::: ### Example 2 (categorical predictor) ::::::: {.panel-tabset} #### Treatment contrasts ::::: {.panel-tabset} ##### Define priors Lets try the following priors: ```{r} #| label: prior_y_means_2a #| echo: FALSE #| cache: FALSE dat2_prior <- dat2 |> group_by(x) |> summarise(across(y, list(med = median, sd = sd, mad = mad))) |> as.data.frame() dat2_prior_b <- dat2 |> group_by(x) |> summarise(across(y, median)) |> pull(y) |> diff() |> abs() |> max() dat2_prior_sig <- dat2 |> group_by(x) |> summarise(across(y, mad)) |> pull(y) |> max() ``` - $\beta_0$: Normal prior centred at `r round(dat2_prior[1,2],2)` with a variance of `r round(dat2_prior[1,4],2)` - mean: ```{r} #| prior_y_mean_2a #| echo: true #| eval: true #| cache: false dat2 |> group_by(x) |> summarise(across(y, list(med = median, sd = sd, mad = mad))) ``` - variance: ```{r} #| prior_y_var_2a #| echo: true #| eval: true #| cache: false dat2 |> group_by(x) |> summarise(across(y, list(med = median, sd = sd, mad = mad))) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centered at 0 with a variance of `r round(dat2_prior_b,2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_2a #| echo: true #| eval: true #| cache: false dat2 |> group_by(x) |> summarise(across(y, median)) |> pull(y) |> diff() |> abs() |> max() ``` - $\sigma$: (half) _t_ distribution (3 degrees of freedom) centred at 0 with a variance of `r round(dat2_prior_sig, 2)` - variance: ```{r} #| prior_y_var2_2a #| echo: true #| eval: true #| cache: false dat2 |> group_by(x) |> summarise(across(y, mad)) |> pull(y) |> max() ``` ```{r} #| label: set_priors_2a #| eval: true #| echo: true priors <- prior(normal(19.68, 1.87), class = "Intercept") + prior(student_t(3, 0, 6.35), class = "b") + prior(student_t(3, 0, 4.7), class = "sigma") ``` ##### Fit the model $$ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \sum{\beta_j x_{ij}}\\ \end{align} $$ 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms2a #| eval: true #| echo: true #| cache: true dat2a.brm <- brm(bf(y ~ x), data=dat2, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_2a #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat2a.brm |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_2a #| eval: true #| echo: true #| cache: true dat2a.brm2 <- update(dat2a.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_2a #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat2a.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_2a #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat2a.brm2 |> tidybayes::get_variables() dat2a.brm2 |> hypothesis("xmedium = 0") |> plot() dat2a.brm2 |> hypothesis("sigma = 0", class = "") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel2a, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat2a.brm2 |> tidybayes::get_variables() dat2a.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: #### Means parameterisation ::::: {.panel-tabset} ##### Define priors Lets try the following priors: ```{r} #| label: prior_y_means_2b #| echo: FALSE #| cache: FALSE dat2_prior <- dat2 |> summarise(across(y, list(med = median, sd = sd, mad = mad))) |> as.data.frame() dat2_prior_sig <- dat2 |> group_by(x) |> summarise(across(y, mad)) |> pull(y) |> max() ``` - $\beta$: _t_ distribution (3 degrees of freedom) prior centered at `r round(dat2_prior[1,1], 2)` with a variance of `r round(dat2_prior[1,3],2)` - mean: since each groups mean is being estimated separately, they could either all have different priors, or more commonly, the same priors. - variance: ```{r} #| prior_y_var1_2b #| echo: true #| eval: true #| cache: false dat2 |> summarise(across(y, list(med = median, sd = sd, mad = mad))) ``` - $\sigma$: (half) _t_ distribution (3 degrees of freedom) centred at 0 with a variance of `r round(dat2_prior_sig, 2)` - variance: ```{r} #| prior_y_var2_2b #| echo: true #| eval: true #| cache: false dat2 |> group_by(x) |> summarise(across(y, mad)) |> pull(y) |> max() ``` ```{r} #| label: set_priors_2b #| eval: true #| echo: true priors <- prior(student_t(3, 16.18, 6.56), class = "b") + prior(student_t(3, 0, 4.7), class = "sigma") ``` ##### Fit the model $$ \begin{align} y_i \sim{}& N(\mu_i, \sigma^2)\\ \mu_i =& \beta_0 + \sum{\beta_j x_{ij}}\\ \end{align} $$ 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms2b #| eval: true #| echo: true #| cache: true dat2b.brm <- brm(bf(y ~ -1 + x), data=dat2, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_2b #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat2b.brm |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_2b #| eval: true #| echo: true #| cache: true dat2b.brm2 <- update(dat2b.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_2b #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat2b.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_2b #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat2b.brm2 |> tidybayes::get_variables() dat2b.brm2 |> hypothesis("xcontrol = 0") |> plot() dat2b.brm2 |> hypothesis("xmedium = 0") |> plot() dat2b.brm2 |> hypothesis("sigma = 0", class = "") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel2b, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat2b.brm2 |> tidybayes::get_variables() #dat2b.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: ::::::: ### Example 3 (Poisson data) ::::::: {.panel-tabset} #### Raw predictor $$ \begin{align} y_i \sim{}& Pois(\lambda_i)\\ log(\lambda_i) =& \beta_0 + \beta_1 x_i\\ \end{align} $$ ::::: {.panel-tabset} ##### Define priors When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate: - $\beta_0$: Normal prior centred at `r round(median(log(dat3$y)),2)` with a variance of `r round(mad(log(dat3$y)),2)` - mean: ```{r} #| prior_y_mean_3a #| echo: true #| eval: true #| cache: false dat3$y |> log() |> median() |> round(2) ``` - variance: ```{r} #| prior_y_var_3a #| echo: true #| eval: true #| cache: false dat3$y |> log() |> mad() |> round(2) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of `r round(mad(log(dat3$y))/mad(log(dat3$x)),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_3a #| echo: true #| eval: true #| cache: false dat3 |> mutate(across(c(y, x), log)) |> summarise(across(c(y, x), mad)) |> mutate(round(y / x, 2)) ``` ```{r} #| label: set_priors_3a #| eval: true #| echo: true priors <- prior(normal(2.00, 1.33), class = "Intercept") + prior(student_t(3, 0, 2.00), class = "b") ``` ##### Fit the model 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms3a #| eval: true #| echo: true #| cache: true dat3a.form <- bf(y ~ x, family = poisson(link = "log")) dat3a.brm <- brm(dat3a.form, data=dat3, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_3a #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat3a.brm |> conditional_effects() |> plot(points = TRUE) |> _[[1]] + scale_y_log10() ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_3a #| eval: true #| echo: true #| cache: true dat3a.brm2 <- update(dat3a.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_3a #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat3a.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_3a #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat3a.brm2 |> tidybayes::get_variables() dat3a.brm2 |> hypothesis("x = 0") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel3a, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat3a.brm2 |> tidybayes::get_variables() dat3a.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: #### Centered predictor $$ \begin{align} y_i \sim{}& Pois(\mu_i)\\ log(\mu_i) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} $$ ::::: {.panel-tabset} ##### Define priors When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate: - $\beta_0$: Normal prior centred at `r round(median(log(dat3$y)),2)` with a variance of `r round(mad(log(dat3$y)),2)` - mean: ```{r} #| prior_y_mean_3b #| echo: true #| eval: true #| cache: false dat3$y |> log() |> median() |> round(2) ``` - variance: ```{r} #| prior_y_var_3b #| echo: true #| eval: true #| cache: false dat3$y |> log() |> mad() |> round(2) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of `r round(mad(log(dat3$y))/mad(log(dat3$x)),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_3b #| echo: true #| eval: true #| cache: false dat3 |> mutate(across(c(y, x), log)) |> summarise(across(c(y, x), mad)) |> mutate(round(y / x, 2)) ``` ```{r} #| label: set_priors_3b #| eval: true #| echo: true priors <- prior(normal(2.00, 1.33), class = "Intercept") + prior(student_t(3, 0, 2.00), class = "b") ``` ##### Fit the model 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms3b #| eval: true #| echo: true #| cache: true dat3b.form <- bf(y ~ scale(x, scale = FALSE), family = poisson(link = "log")) dat3b.brm <- brm(dat3b.form, data=dat3, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_3b #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat3b.brm |> conditional_effects() |> plot(points = TRUE) |> _[[1]] + scale_y_log10() ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_3b #| eval: true #| echo: true #| cache: true dat3b.brm2 <- update(dat3b.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_3b #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat3b.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_3b #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat3b.brm2 |> tidybayes::get_variables() dat3b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel3b, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat3b.brm2 |> tidybayes::get_variables() dat3b.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: #### Standardised predictor $$ \begin{align} y_i \sim{}& Pois(\lambda_i)\\ log(\lambda_i) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} $$ ::::: {.panel-tabset} ##### Define priors When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate: - $\beta_0$: Normal prior centred at `r round(median(log(dat3$y)),2)` with a variance of `r round(mad(log(dat3$y)),2)` - mean: ```{r} #| prior_y_mean_3c #| echo: true #| eval: true #| cache: false dat3$y |> log() |> median() |> round(2) ``` - variance: ```{r} #| prior_y_var_3c #| echo: true #| eval: true #| cache: false dat3$y |> log() |> mad() |> round(2) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of `r round(mad(log(dat3$y)),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_3c #| echo: true #| eval: true #| cache: false dat3$y |> log() |> mad() |> round(2) ``` ```{r} #| label: set_priors_3c #| eval: true #| echo: true priors <- prior(normal(2.00, 1.33), class = "Intercept") + prior(student_t(3, 0, 1.33), class = "b") ``` ##### Fit the model 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms3c #| eval: true #| echo: true #| cache: true dat3c.form <- bf(y ~ scale(x), family = poisson(link = "log")) dat3c.brm <- brm(dat3c.form, data=dat3, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_3c #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat3c.brm |> conditional_effects() |> plot(points = TRUE) |> _[[1]] + scale_y_log10() ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_3c #| eval: true #| echo: true #| cache: true dat3c.brm2 <- update(dat3c.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_3c #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat3c.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_3c #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat3c.brm2 |> tidybayes::get_variables() dat3c.brm2 |> hypothesis("scalex = 0") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel3c, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat3c.brm2 |> tidybayes::get_variables() dat3c.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: ::::::: ### Example 4 (NB) ::::::: {.panel-tabset} #### Raw predictor $$ \begin{align} y_i \sim{}& NB(\lambda_i, \phi)\\ log(\lambda_i) =& \beta_0 + \beta_1 x_i\\ \end{align} $$ ::::: {.panel-tabset} ##### Define priors When considering priors, it is important to remember that they apply to parameters on the link scale. For Negative Binomial models, the link scale is log. So the following priors might be appropriate: - $\beta_0$: Normal prior centred at `r round(median(log(dat4$y)),2)` with a variance of `r round(mad(log(dat4$y)),2)` - mean: ```{r} #| prior_y_mean_4a #| echo: true #| eval: true #| cache: false dat4$y |> log() |> median() |> round(2) ``` - variance: ```{r} #| prior_y_var_4a #| echo: true #| eval: true #| cache: false dat4$y |> log() |> mad() |> round(2) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of `r round(mad(log(dat4$y))/mad(log(dat4$x)),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_4a #| echo: true #| eval: true #| cache: false dat4 |> mutate(across(c(y, x), log)) |> summarise(across(c(y, x), mad)) |> mutate(round(y / x, 2)) ``` ```{r} #| label: set_priors_4a #| eval: true #| echo: true priors <- prior(normal(2.00, 1.00), class = "Intercept") + prior(student_t(3, 0, 1.5), class = "b") + prior(gamma(0.01, 0.01), class = "shape") ``` ##### Fit the model 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms4a #| eval: true #| echo: true #| cache: true dat4a.form <- bf(y ~ x, family = negbinomial(link = "log")) dat4a.brm <- brm(dat4a.form, data=dat4, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_4a #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat4a.brm |> conditional_effects() |> plot(points = TRUE) |> _[[1]] + scale_y_log10() ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_4a #| eval: true #| echo: true #| cache: true dat4a.brm2 <- update(dat4a.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_4a #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat4a.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_4a #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat4a.brm2 |> tidybayes::get_variables() dat4a.brm2 |> hypothesis("x = 0") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel4a, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat4a.brm2 |> tidybayes::get_variables() dat4a.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: #### Centered predictor $$ \begin{align} y_i \sim{}& NB(\mu_i, \phi)\\ log(\mu_i) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} $$ ::::: {.panel-tabset} ##### Define priors When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate: - $\beta_0$: Normal prior centred at `r round(median(log(dat4$y)),2)` with a variance of `r round(mad(log(dat4$y)),2)` - mean: ```{r} #| prior_y_mean_4b #| echo: true #| eval: true #| cache: false dat4$y |> log() |> median() |> round(2) ``` - variance: ```{r} #| prior_y_var_4b #| echo: true #| eval: true #| cache: false dat4$y |> log() |> mad() |> round(2) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of `r round(mad(log(dat4$y))/mad(log(dat4$x)),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_4b #| echo: true #| eval: true #| cache: false dat4 |> mutate(across(c(y, x), log)) |> summarise(across(c(y, x), mad)) |> mutate(round(y / x, 2)) ``` ```{r} #| label: set_priors_4b #| eval: true #| echo: true priors <- prior(normal(2.07, 0.93), class = "Intercept") + prior(student_t(3, 0, 1.43), class = "b") + prior(gamma(0.01, 0.01), class = "shape") ``` ##### Fit the model 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms4b #| eval: true #| echo: true #| cache: true dat4b.form <- bf(y ~ scale(x, scale = FALSE), family = negbinomial(link = "log")) dat4b.brm <- brm(dat4b.form, data=dat4, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_4b #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat4b.brm |> conditional_effects() |> plot(points = TRUE) |> _[[1]] + scale_y_log10() ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_4b #| eval: true #| echo: true #| cache: true dat4b.brm2 <- update(dat4b.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_4b #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat4b.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_4b #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat4b.brm2 |> tidybayes::get_variables() dat4b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel4b, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat4b.brm2 |> tidybayes::get_variables() dat4b.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: #### Standardised predictor $$ \begin{align} y_i \sim{}& NB(\lambda_i, \phi)\\ log(\lambda_i) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} $$ ::::: {.panel-tabset} ##### Define priors When considering priors, it is important to remember that they apply to parameters on the link scale. For Poisson models, the link scale is log. So the following priors might be appropriate: - $\beta_0$: Normal prior centred at `r round(median(log(dat4$y)),2)` with a variance of `r round(mad(log(dat4$y)),2)` - mean: ```{r} #| prior_y_mean_4c #| echo: true #| eval: true #| cache: false dat4$y |> log() |> median() |> round(2) ``` - variance: ```{r} #| prior_y_var_4c #| echo: true #| eval: true #| cache: false dat4$y |> log() |> mad() |> round(2) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of `r round(mad(log(dat4$y)),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_4c #| echo: true #| eval: true #| cache: false dat4$y |> log() |> mad() |> round(2) ``` ```{r} #| label: set_priors_4c #| eval: true #| echo: true priors <- prior(normal(2.07,0.93), class = "Intercept") + prior(student_t(3, 0, 1.43), class = "b") + prior(gamma(0.01, 0.01), class = "shape") ``` ##### Fit the model 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms4c #| eval: true #| echo: true #| cache: true dat4c.form <- bf(y ~ scale(x), family = negbinomial(link = "log")) dat4c.brm <- brm(dat4c.form, data=dat4, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_4c #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat4c.brm |> conditional_effects() |> plot(points = TRUE) |> _[[1]] + scale_y_log10() ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_4c #| eval: true #| echo: true #| cache: true dat4c.brm2 <- update(dat4c.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_4c #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat4c.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_4c #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat4c.brm2 |> tidybayes::get_variables() dat3c.brm2 |> hypothesis("scalex = 0") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel4c, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat4c.brm2 |> tidybayes::get_variables() dat4c.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: ::::::: ### Example 5 (Binary data) ::::::: {.panel-tabset} #### Raw predictor $$ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + \beta_1 x_i\\ \end{align} $$ ::::: {.panel-tabset} ##### Define priors When considering priors, it is important to remember that they apply to parameters on the link scale. For Binary models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful: - the observed response values are only ever either 0 or 1 - a linear model is exploring whether the probability of a 1 changes from high to low or low to high according to the linear predictor - the switch in probability is likely to be somewhere near the middle of the $x$ range - with a centered predictor, the mean response is expected to be approximately 0.5 - on a logit (log odds) scale, this corresponds to a value of 0. - on a logit (log odds) scale, values of -3 and 3 are considered very wide - on a logit scale, values between -1 and 1 are reasonable. So the following priors might be appropriate: - $\beta_0$: Normal prior centred at 0 with a variance of 1 - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of 1 ```{r} #| label: set_priors_5a #| eval: true #| echo: true priors <- prior(normal(0, 1), class = "Intercept") + prior(student_t(3, 0, 1), class = "b") ``` ##### Fit the model 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms5a #| eval: true #| echo: true #| cache: true dat5a.form <- bf(y | trials(1) ~ x, family = binomial(link = "logit")) dat5a.brm <- brm(dat5a.form, data=dat5, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} For Binary data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough. ```{r} #| label: brms_conditional_effects_5a #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat5a.brm |> conditional_effects(method = "posterior_linpred") dat5a.brm |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_5a #| eval: true #| echo: true #| cache: true dat5a.brm2 <- update(dat5a.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_5a #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat5a.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_5a #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat5a.brm2 |> tidybayes::get_variables() dat5a.brm2 |> hypothesis("x = 0") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel5a, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat5a.brm2 |> tidybayes::get_variables() dat5a.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: #### Centered predictor $$ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1 -\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} $$ ::::: {.panel-tabset} ##### Define priors - $\beta_0$: Normal prior centred at 0 with a variance of 1 - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of 1 ```{r} #| label: set_priors_5b #| eval: true #| echo: true priors <- prior(normal(0, 1), class = "Intercept") + prior(student_t(3, 0, 1), class = "b") ``` ##### Fit the model 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms5b #| eval: true #| echo: true #| cache: true dat5b.form <- bf(y | trials(1) ~ scale(x, scale = FALSE), family = binomial(link = "logit")) dat5b.brm <- brm(dat5b.form, data=dat5, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_5b #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat5b.brm |> conditional_effects(method = "posterior_linpred") ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_5b #| eval: true #| echo: true #| cache: true dat5b.brm2 <- update(dat5b.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_5b #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat5b.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_5b #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat5b.brm2 |> tidybayes::get_variables() dat5b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel5b, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat5b.brm2 |> tidybayes::get_variables() dat5b.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: #### Standardised predictor $$ \begin{align} y_i \sim{}& Bin(\pi_i, 1)\\ log(\frac{\pi_i}{1 -\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})/\sigma_x\\ \end{align} $$ ::::: {.panel-tabset} ##### Define priors - $\beta_0$: Normal prior centred at 0 with a variance of 1 - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of 1 ```{r} #| label: set_priors_5c #| eval: true #| echo: true priors <- prior(normal(0, 1), class = "Intercept") + prior(student_t(3, 0, 1), class = "b") ``` ##### Fit the model 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms5c #| eval: true #| echo: true #| cache: true dat5c.form <- bf(y | trials(1) ~ scale(x), family = binomial(link = "logit")) dat5c.brm <- brm(dat5c.form, data=dat5, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} ```{r} #| label: brms_conditional_effects_5c #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat5c.brm |> conditional_effects(method = "posterior_linpred") ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_5c #| eval: true #| echo: true #| cache: true dat5c.brm2 <- update(dat5c.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_5c #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat5c.brm2 |> conditional_effects() |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_5c #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat5c.brm2 |> tidybayes::get_variables() dat5c.brm2 |> hypothesis("scalex = 0") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel5c, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat5c.brm2 |> tidybayes::get_variables() dat5c.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: ::::::: ### Example 6 (Binomial data) ::::::: {.panel-tabset} #### Raw predictor $$ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + \beta_1 x_i\\ \end{align} $$ ::::: {.panel-tabset} ##### Define priors When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful: - the expected $\pi$ values are only ever between 0 or 1 - on a logit (log odds) scale, values of -3 and 3 are considered very wide - on a logit scale, values between -1 and 1 are reasonable. So the following priors might be appropriate: - $\beta_0$: Normal prior centred at `r round(median(log(dat6$count/dat6$total)))` with a variance of `r round(mad(log(dat6$count/dat6$total)))` - mean: ```{r} #| prior_y_mean_6a #| echo: true #| eval: true #| cache: false (dat6$count/dat6$total) |> log() |> median() |> round(2) ``` - variance: ```{r} #| prior_y_var_6a #| echo: true #| eval: true #| cache: false (dat6$count/dat6$total) |> log() |> mad() |> round(2) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of `r round(mad(log(dat6$count/dat6$total))/mad(log(dat6$x)),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_6a #| echo: true #| eval: true dat6 |> mutate(across(c(y, x), log)) |> summarise(across(c(y, x), mad)) |> mutate(round(y / x, 2)) ``` ```{r} #| label: set_priors_6a #| eval: true #| echo: true priors <- prior(normal(-0.22, 0.33), class = "Intercept") + prior(student_t(3, 0, 0.51), class = "b") ``` ##### Fit the model 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms6a #| eval: true #| echo: true #| cache: true dat6a.form <- bf(count | trials(total) ~ x, family = binomial(link = "logit")) dat6a.brm <- brm(dat6a.form, data=dat6, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} For Binary data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough. ```{r} #| label: brms_conditional_effects_6a #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat6a.brm |> conditional_effects(method = "posterior_linpred") dat6a.brm |> conditional_effects(conditions = data.frame(total = dat6$total)) |> plot(points = TRUE) ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_6a #| eval: true #| echo: true #| cache: true dat6a.brm2 <- update(dat6a.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_6a #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat6a.brm2 |> conditional_effects(conditions = data.frame(total = dat6$total)) |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_6a #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat6a.brm2 |> tidybayes::get_variables() dat6a.brm2 |> hypothesis("x = 0") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel6a, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat6a.brm2 |> tidybayes::get_variables() dat6a.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: #### Centered predictor $$ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\\ \end{align} $$ ::::: {.panel-tabset} ##### Define priors When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful: - the expected $\pi$ values are only ever between 0 or 1 - on a logit (log odds) scale, values of -3 and 3 are considered very wide - on a logit scale, values between -1 and 1 are reasonable. So the following priors might be appropriate: - $\beta_0$: Normal prior centred at `r round(median(log(dat6$count/dat6$total)))` with a variance of `r round(mad(log(dat6$count/dat6$total)))` - mean: ```{r} #| prior_y_mean_6b #| echo: true #| eval: true #| cache: false (dat6$count/dat6$total) |> log() |> median() |> round(2) ``` - variance: ```{r} #| prior_y_var_6b #| echo: true #| eval: true #| cache: false (dat6$count/dat6$total) |> log() |> mad() |> round(2) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of `r round(mad(log(dat6$count/dat6$total))/mad(log(dat6$x)),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_6b #| echo: true #| eval: true dat6 |> mutate(across(c(y, x), log)) |> summarise(across(c(y, x), mad)) |> mutate(round(y / x, 2)) ``` ```{r} #| label: set_priors_6b #| eval: true #| echo: true priors <- prior(normal(-0.22, 0.33), class = "Intercept") + prior(student_t(3, 0, 0.51), class = "b") ``` ##### Fit the model 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms6b #| eval: true #| echo: true #| cache: true dat6b.form <- bf(count | trials(total) ~ scale(x, scale = FALSE), family = binomial(link = "logit")) dat6b.brm <- brm(dat6b.form, data=dat6, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} For Binomial data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough. ```{r} #| label: brms_conditional_effects_6b #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat6b.brm |> conditional_effects(method = "posterior_linpred") dat6b.brm |> conditional_effects(conditions = data.frame(total = dat6$total)) |> plot(points = TRUE) ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_6b #| eval: true #| echo: true #| cache: true dat6b.brm2 <- update(dat6b.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_6b #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat6b.brm2 |> conditional_effects(conditions = data.frame(total = dat6$total)) |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_6b #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat6b.brm2 |> tidybayes::get_variables() dat6b.brm2 |> hypothesis("scalexscaleEQFALSE = 0") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel6b, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat6b.brm2 |> tidybayes::get_variables() dat6b.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: #### Standardized predictor $$ \begin{align} y_i \sim{}& Bin(\pi_i, n_i)\\ log(\frac{\pi_i}{1-\pi_i}) =& \beta_0 + (\beta_1 x_i - \bar{x})\sigma\\ \end{align} $$ ::::: {.panel-tabset} ##### Define priors When considering priors, it is important to remember that they apply to parameters on the link scale. For Binomial models, the link scale is logit. Binomial data is notoriously difficult to define priors for. Nevertheless the following considerations are useful: - the expected $\pi$ values are only ever between 0 or 1 - on a logit (log odds) scale, values of -3 and 3 are considered very wide - on a logit scale, values between -1 and 1 are reasonable. So the following priors might be appropriate: - $\beta_0$: Normal prior centred at `r round(median(log(dat6$count/dat6$total)))` with a variance of `r round(mad(log(dat6$count/dat6$total)))` - mean: ```{r} #| prior_y_mean_6c #| echo: true #| eval: true #| cache: false (dat6$count/dat6$total) |> log() |> median() |> round(2) ``` - variance: ```{r} #| prior_y_var_6c #| echo: true #| eval: true #| cache: false (dat6$count/dat6$total) |> log() |> mad() |> round(2) ``` - $\beta_1$: _t_ distribution (3 degrees of freedom) prior centred at 0 with a variance of `r round(mad(log(dat6$count/dat6$total)),2)` - mean: since effects are differences and we may not want to pre-determine the direction (polarity) of any trends, it is typical to define effects priors with means of 0 - variance: ```{r} #| prior_y_var1_6c #| echo: true #| eval: true (dat6$count / dat6$total) |> log() |> mad() |> round(2) ``` ```{r} #| label: set_priors_6c #| eval: true #| echo: true priors <- prior(normal(-0.22, 0.33), class = "Intercept") + prior(student_t(3, 0, 0.33), class = "b") ``` ##### Fit the model 1. start by fitting the model and sampling from the priors only ::: {.indented} ```{r} #| label: brms6c #| eval: true #| echo: true #| cache: true dat6c.form <- bf(count | trials(total) ~ scale(x), family = binomial(link = "logit")) dat6c.brm <- brm(dat6c.form, data=dat6, prior=priors, sample_prior = 'only', iter = 5000, warmup = 1000, chains = 3, cores = 3, thin = 5, backend = "rstan", refresh = 0) ``` ::: 2. explore the range of posterior predictions resulting from the priors alone ::: {.indented} For Binomial data, it is often more useful to explore the predictions on the link scale. Ribbons that extend much beyond -3 and 3 would definitely be considered wide enough. ```{r} #| label: brms_conditional_effects_6c #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat6c.brm |> conditional_effects(method = "posterior_linpred") dat6c.brm |> conditional_effects(conditions = data.frame(total = dat6$total)) |> plot(points = TRUE) ``` **Conclusions:** - the grey ribbon above represents the credible range of the posterior predictions - this ribbon is substantially wider than the observed data (black points), suggesting that the priors are not overly narrow - the range of posterior predictions are not widely unreasonable (although we could argue that negative predictions might be illogical) since the maximums are not multiple orders of magnitude above the observed data. ::: 3. now refit the model such that it samples from both the priors and likelihood (that is allow the data to have an impact on the estimates) ::: {.indented} ```{r} #| label: brms_fit2_6c #| eval: true #| echo: true #| cache: true dat6c.brm2 <- update(dat6c.brm, sample_prior = "yes" ) ``` ::: 4. re-explore the range of posterior predictions resulting from the fitted model ::: {.indented} ```{r} #| label: brms_conditional_effects2_6c #| eval: true #| echo: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat6c.brm2 |> conditional_effects(conditions = data.frame(total = dat6$total)) |> plot(points = TRUE) ``` **Conclusions:** - the range of the posteriors are now substantially reduced (now that the model includes both data and priors) - this suggests that the patterns are being driven predominantly by the data ::: 5. compare the priors and posteriors to further confirm that the priors are not overly influential ::: {.indented} When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0). When doing so, ideally the posterior should be distinct from the prior. By distinct, I mean that the posteriors should be clearly distinguishable from the priors. Ideally, the priors should be less variable than the priors. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors. ```{r} #| labels: brms_prior_posterior_6c #| echo: true #| eval: true #| cache: true #| fig-width: 4 #| fig-height: 3 dat6c.brm2 |> tidybayes::get_variables() dat6c.brm2 |> hypothesis("scalex = 0") |> plot() ``` Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales. ```{r fitModel6c, results='markdown', eval=TRUE, mhidden=TRUE, fig.width = 8, fig.height = 4} dat6c.brm2 |> tidybayes::get_variables() dat6c.brm2 |> SUYR_prior_and_posterior() ``` **Conclusions:** - each of the priors are substantially wider than the posteriors - the posteriors are definitely distinct from their respective priors and thus we can conclude that the priors and not driving the posteriors (e.g. they are not influencing the outcomes) - the priors are simply regularising the parameters such that they are only sampled from plausible regions ::: ::::: ::::::: :::: # MCMC sampling diagnostics ```{r} #| label: mcmcsampling #| echo: false mcmcsim <- function(n=500,start.x=0.5, start.y=0.5,tex=FALSE, lns=FALSE, fls=FALSE) { library(mvtnorm) xv <- seq(-3,3,by=0.1) yv <- seq(0,10,by=0.1) #Create the three dimensional plot pmat <- persp(xv,yv,outer(xv,yv, function(x,y) dmvnorm(cbind(x,y), mean=c(0,5), sigma=diag(2)*c(1,2))), zlab="density",xlab="parameter a",ylab="parameter b",phi=30,theta=0,ticktype="detailed",shade=0.9,lphi=30,ltheta=0) #Calculate the z-score for the starting coordinates z <- outer(start.x,start.y, function(x,y) dmvnorm(cbind(x,y), mean=c(0,5), sigma=diag(2)*c(1,2))) #plot the starting point points(trans3d(start.x,start.y,z,pmat), col="red",pch=16) #Label the point if (tex) text(trans3d(start.x,start.y,z,pmat),labels=paste(1),pos=4, font=2, col="red") #iteratively move through the Markov chain if (n >1) { for (i in 2:n) { #Get a new candidate location by a random point around the current point newLoc <- rmvnorm(1,mean=c(0,0),sigma=diag(2)*0.5) new.x=start.x+newLoc[1] new.y=start.y+newLoc[2] #plot this new point if (fls) points(trans3d(new.x,new.y,outer(new.x,new.y, function(x,y) dmvnorm(cbind(x,y), mean=c(0,5), sigma=diag(2)*c(1,2))),pmat), col="blue",pch=16) z1 <- outer(new.x,new.y, function(x,y) dmvnorm(cbind(x,y), mean=c(0,5), sigma=diag(2)*c(1,2))) #Calculate the ratio of probability of the new point to the previous point rRatio <- dmvnorm(c(new.x,new.y), mean=c(0,5), sigma=diag(2)*c(1,2))/dmvnorm(c(start.x,start.y), mean=c(0,5), sigma=diag(2)*c(1,2)) #Get a random number between 0 and 1 rCompRatio<-runif(1) old.x=start.x old.y=start.y #If the calculated ratio is greater than the random number, accept this new location for sampling if (rRatio>rCompRatio) { #accept current position start.x=new.x start.y=new.y z <- outer(old.x,old.y, function(x,y) dmvnorm(cbind(x,y), mean=c(0,5), sigma=diag(2)*c(1,2))) } #otherwise stay were you are #points(trans3d(start.x,start.y,outer(start.x,start.y, function(x,y) dmvnorm(cbind(x,y), mean=c(0,5), sigma=diag(2)*c(1,2))),pmat), col="red",pch=16) z2<-outer(start.x,start.y, function(x,y) dmvnorm(cbind(x,y), mean=c(0,5), sigma=diag(2)*c(1,2))) points(trans3d(start.x,start.y,z2,pmat), col="red",pch=16) if (rRatio>rCompRatio) { lab<-paste(i) if (tex) text(trans3d(start.x,start.y,z2,pmat),labels=paste(i),pos=4, font=2, col="red") }else { lab <- paste(lab,"&", i) if(tex) text(trans3d(start.x,start.y,z2,pmat),labels=lab,pos=4, font=2, col="red") if(tex)text(trans3d(new.x,new.y,z1,pmat),labels=i,pos=4, font=2,col="blue") } if(lns)lines(trans3d(c(old.x,start.x),c(old.y,start.y),c(z,z2),pmat), col="blue",lwd=2) z<-z2 } } } mcmcsim1 <- function(n=500,start.x=0.5, start.y=0.5,tex=FALSE, lns=FALSE, fls=FALSE, thin=1,burnin=0) { library(mvtnorm) xv <- seq(-3,3,by=0.1) yv <- seq(0,10,by=0.1) cmat<-contour(xv,yv,outer(xv,yv, function(x,y) dmvnorm(cbind(x,y), mean=c(0,5), sigma=diag(2)*c(1,2))),xlab="Parameter a",ylab="Parameter b",col="lightblue") points(start.x,start.y, col="red",pch=16) nn <- 0 ii<-0 params <- c(start.x,start.y) if (tex) text(start.x,start.y,labels=paste(1),pos=4, font=2, col="red") if(n>1) { for (i in 2:n) { if (i>burnin) ii <- i-burnin newLoc <- rmvnorm(1,mean=c(0,0),sigma=diag(2)*0.5) new.x=start.x+newLoc[1] new.y=start.y+newLoc[2] if (fls) points(new.x,new.y, col="blue",pch=16) rRatio <- dmvnorm(c(new.x,new.y), mean=c(0,5), sigma=diag(2)*c(1,2))/dmvnorm(c(start.x,start.y), mean=c(0,5), sigma=diag(2)*c(1,2)) rCompRatio<-runif(1) old.x=start.x old.y=start.y if (rRatio>rCompRatio) { #accept current position start.x=new.x start.y=new.y } points(start.x,start.y, col="red",pch=16) if (rRatio>rCompRatio) { lab=paste(i) if (tex) text(start.x,start.y,labels=paste(i),pos=4, font=2, col="red") }else { lab =paste(lab,"&", i) if(tex) text(start.x,start.y,labels=lab,pos=4, font=2, col="red") if(tex & fls) text(new.x,new.y,labels=i,pos=4, font=2,col="blue") if(lns) lines(c(new.x,start.x),c(new.y,start.y), col="blue",lwd=1, lty=2) } if(lns) lines(c(old.x,start.x),c(old.y,start.y), col="blue",lwd=2) if ((ii %% thin)==0 & ii>0) { nn <- nn+1 params<- rbind(params,c(start.x,start.y)) } } } params } ``` ```{r} #| label: sim6a #| echo: false #| fig-height: 5 #| fig-width: 9 #| cache: true #| warning: false #| include: false #| results: hide set.seed(9) par(mfrow=c(1,2), mar=c(1,1,0,0)) mcmcsim(n=10000, start.x=0.5, start.y=3,tex=FALSE, lns=FALSE, fls=FALSE) set.seed(9) par(mar=c(4,4,1,1)) params<-mcmcsim1(n=10000, start.x=0.5, start.y=3,tex=FALSE,lns=FALSE,fls=FALSE) ``` **MCMC sampling behaviour** Since the purpose of the MCMC sampling is to estimate the posterior of an unknown joint likelihood, it is important that we explore a range of diagnostics designed to help identify when the resulting likelihood might not be accurate. - **traceplots** - plots of the individual draws in sequence. Traces that resemble noise suggest that all likelihood features are likely to have be traversed. Obvious steps or blocks of noise are likely to represent distinct features and could imply that there are yet other features that have not yet been traversed - necessitating additional iterations. Furthermore, each chain should be indistinguishable from the others ::: {.indented} ```{r} #| label: sim7 #| echo: false #| fig-height: 4 #| fig-width: 9 #| cache: true #| warning: false par(mfrow=c(1,3)) a <- rnorm(10000,10.1,3) plot(a, type="l",main="Good mixing") aa <- params[1:100,1]+10 a <- NULL for (i in 1:length(aa)) { a <- c(a,replicate(10,aa[i]+rnorm(1,0,0.25))) } plot(a, type="l", main="Bad mixing") a <- c(rnorm(4000,10.1,1),seq(10,6,l=1000)+rnorm(1000,0,0.2),rnorm(4000,6.1,1),seq(6,7,l=1000)+rnorm(1000,0,0.2),rnorm(2000,7.1,1)) plot(a, type="l",main="Bad mixing") ``` ::: - **autocorrelation function** - plots of the degree of correlation between pairs of draws for a range of lags (distance along the chains). High levels of correlation (after a lag of 0, which is correlating each draw with itself) suggests a lack of independence between the draws and that therefore, summaries such as mean and median will be biased estimates. Ideally, all non-zero lag correlations should be less than 0.2. The left hand figure below demonstrates a clear pattern of autocorrelation, whereas the right hand figure shows no autocorrelation. ::: {.indented} :::: {.columns} ::::: {.column width="45%"} ```{r} #| label: sim8 #| echo: false #| fig-height: 4 #| fig-width: 4 #| cache: true #| warning: false library(coda) set.seed(9) #par(mfrow=c(1,2),mar=c(4,4,2,1)) #params<-mcmcsim1(n=1000, start.x=0.5, start.y=3,tex=FALSE,lns=FALSE,fls=FALSE,thin=1) #points(params[,1],params[,2], col="black",bg="red",pch=21) acf(params[,1], main = "") #autocorr(as.mcmc(params[,1])) ``` ::::: ::::: {.column width="45%"} ```{r} #| label: sim8b1 #| echo: false #| fig-height: 5 #| fig-width: 9 #| cache: true #| warning: false #| include: false #| results: hide library(coda) set.seed(9) params<-mcmcsim1(n=1000, start.x=0.5, start.y=3,tex=FALSE,lns=FALSE,fls=FALSE,thin=10) ``` ```{r} #| label: sim8b2 #| echo: false #| fig-height: 4 #| fig-width: 4 #| cache: true #| warning: false library(coda) set.seed(9) acf(params[,1],main="") ``` ::::: :::: ::: - **convergence diagnostics** - there are a range of diagnostics aimed at exploring whether the multiple chains are likely to have converged upon similar posteriors - **R hat** - this metric compares between and within chain model parameter estimates, with the expectation that if the chains have converged, the between and within rank normalised estimates should be very similar (and Rhat should be close to 1). The more one chains deviates from the others, the higher the Rhat value. Values less than 1.05 are considered evidence of convergence. - **Bulk ESS** - this is a measure of the effective sample size from the whole (bulk) of the posterior and is a good measure of the sampling efficiency of draws across the entire posterior - **Tail ESS** - this is a measure of the effective sample size from the 5% and 95% quantiles (tails) of the posterior and is a good measure of the sampling efficiency of draws from the tail (areas of the posterior with least support and where samplers can get stuck). There are numerous packages in R that support MCMC diagnostics. Popular packages include: - `bayesplot` - `rstan` - `ggmcmcm` Some of the most useful diagnostics are presented in the following table. | Package | Description | function | rstanarm | brms | |-----------|-------------------|------------------------|----------------------------------|------------------------------------| | bayesplot | Traceplot | `mcmc_trace` | `plot(mod, plotfun='trace')` | `mcmc_plot(mod, type='trace')` | | | Density plot | `mcmc_dens` | `plot(mod, plotfun='dens')` | `mcmc_plot(mod, type='dens')` | | | Density & Trace | `mcmc_combo` | `plot(mod, plotfun='combo')` | `mcmc_plot(mod, type='combo')` | | | ACF | `mcmc_acf_bar` | `plot(mod, plotfun='acf_bar')` | `mcmc_plot(mod, type='acf_bar')` | | | Rhat hist | `mcmc_rhat_hist` | `plot(mod, plotfun='rhat_hist')` | `mcmc_plot(mod, type='rhat_hist')` | | | No. Effective | `mcmc_neff_hist` | `plot(mod, plotfun='neff_hist')` | `mcmc_plot(mod, type='neff_hist')` | | rstan | Traceplot | `stan_trace` | `stan_trace(mod)` | `stan_trace(mod)` | | | ACF | `stan_ac` | `stan_ac(mod)` | `stan_ac(mod)` | | | Rhat | `stan_rhat` | `stan_rhat(mod)` | `stan_rhat(mod)` | | | No. Effective | `stan_ess` | `stan_ess(mod)` | `stan_ess(mod)` | | | Density plot | `stan_dens` | `stan_dens(mod)` | `stan_dens(mod)` | | ggmcmc | Traceplot | `ggs_traceplot` | `ggs_traceplot(ggs(mod))` | `ggs_traceplot(ggs(mod))` | | | ACF | `ggs_autocorrelation` | `ggs_autocorrelation(ggs(mod))` | `ggs_autocorrelation(ggs(mod))` | | | Rhat | `ggs_Rhat` | `ggs_Rhat(ggs(mod))` | `ggs_Rhat(ggs(mod))` | | | No. Effective | `ggs_effective` | `ggs_effective(ggs(mod))` | `ggs_effective(ggs(mod))` | | | Cross correlation | `ggs_crosscorrelation` | `ggs_crosscorrelation(ggs(mod))` | `ggs_crosscorrelation(ggs(mod))` | | | Scale reduction | `ggs_grb` | `ggs_grb(ggs(mod))` | `ggs_grb(ggs(mod))` | | | | | | | : {.primary .bordered .sm .paramsTable tbl-colwidths="[10,10, 20, 30, 30]"} I personally prefer the `rstan` version of plots and thus these are the ones I will showcase. :::: {.panel-tabset} ### Example 1 (Gaussian data) ::: {.callout-note} Bayesian samplers involve many calls to randomisation functions. As a result, the estimates will vary slightly each time the routines are run. You should expect that the outputs that you obtain will differ slightly from those that I am displaying. Nevertheless, the main conclusions should remain robust across subsequent runs. ::: ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots1a #| fig-width: 6 #| fig-height: 4 dat1a.brm2$fit |> stan_trace() dat1a.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf1a #| fig-width: 6 #| fig-height: 4 dat1a.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat1a #| fig-width: 6 #| fig-height: 4 dat1a.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess1a #| fig-width: 6 #| fig-height: 4 dat1a.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. #### Centered predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots1b #| fig-width: 6 #| fig-height: 4 dat1b.brm2$fit |> stan_trace() dat1b.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf1b #| fig-width: 6 #| fig-height: 4 dat1b.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat1b #| fig-width: 6 #| fig-height: 4 dat1b.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess1b #| fig-width: 6 #| fig-height: 4 dat1b.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. #### Standardised predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots1c #| fig-width: 6 #| fig-height: 4 dat1c.brm2$fit |> stan_trace() dat1c.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf1c #| fig-width: 6 #| fig-height: 4 dat1c.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat1c #| fig-width: 6 #| fig-height: 4 dat1c.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess1c #| fig-width: 6 #| fig-height: 4 dat1c.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. ::::::: ### Example 2 (categorical predictor) ::::::: {.panel-tabset} #### Treatment contrasts ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots2a #| fig-width: 6 #| fig-height: 4 dat2a.brm2$fit |> stan_trace() dat2a.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf2a #| fig-width: 6 #| fig-height: 4 dat2a.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat2a #| fig-width: 6 #| fig-height: 4 dat2a.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess2a #| fig-width: 6 #| fig-height: 4 dat2a.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. #### Means parameterisation ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots2b #| fig-width: 6 #| fig-height: 4 dat2b.brm2$fit |> stan_trace() dat2b.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf2b #| fig-width: 6 #| fig-height: 4 dat2b.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat2b #| fig-width: 6 #| fig-height: 4 dat2b.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess2b #| fig-width: 6 #| fig-height: 4 dat2b.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. ::::::: ### Example 3 (Poisson data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots3a #| fig-width: 6 #| fig-height: 4 dat3a.brm2$fit |> stan_trace() dat3a.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf3a #| fig-width: 6 #| fig-height: 4 dat3a.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat3a #| fig-width: 6 #| fig-height: 4 dat3a.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess3a #| fig-width: 6 #| fig-height: 4 dat3a.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. #### Centered predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots3b #| fig-width: 6 #| fig-height: 4 dat3b.brm2$fit |> stan_trace() dat3b.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf3b #| fig-width: 6 #| fig-height: 4 dat3b.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat3b #| fig-width: 6 #| fig-height: 4 dat3b.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess3b #| fig-width: 6 #| fig-height: 4 dat3b.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. #### Standardised predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots3c #| fig-width: 6 #| fig-height: 4 dat3c.brm2$fit |> stan_trace() dat3c.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf3c #| fig-width: 6 #| fig-height: 4 dat3c.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat3c #| fig-width: 6 #| fig-height: 4 dat3c.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess3c #| fig-width: 6 #| fig-height: 4 dat3c.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. ::::::: ### Example 4 (NB data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots4a #| fig-width: 6 #| fig-height: 4 dat4a.brm2$fit |> stan_trace() dat4a.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf4a #| fig-width: 6 #| fig-height: 4 dat4a.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat4a #| fig-width: 6 #| fig-height: 4 dat4a.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess4a #| fig-width: 6 #| fig-height: 4 dat4a.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. #### Centered predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots4b #| fig-width: 6 #| fig-height: 4 dat4b.brm2$fit |> stan_trace() dat4b.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf4b #| fig-width: 6 #| fig-height: 4 dat4b.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat4b #| fig-width: 6 #| fig-height: 4 dat4b.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess4b #| fig-width: 6 #| fig-height: 4 dat4b.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. #### Standardised predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots4c #| fig-width: 6 #| fig-height: 4 dat4c.brm2$fit |> stan_trace() dat4c.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf4c #| fig-width: 6 #| fig-height: 4 dat4c.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat4c #| fig-width: 6 #| fig-height: 4 dat4c.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess4c #| fig-width: 6 #| fig-height: 4 dat4c.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. ::::::: ### Example 5 (Binary data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots5a #| fig-width: 6 #| fig-height: 4 dat5a.brm2$fit |> stan_trace() dat5a.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf5a #| fig-width: 6 #| fig-height: 4 dat5a.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat5a #| fig-width: 6 #| fig-height: 4 dat5a.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess5a #| fig-width: 6 #| fig-height: 4 dat5a.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. #### Centered predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots5b #| fig-width: 6 #| fig-height: 4 dat5b.brm2$fit |> stan_trace() dat5b.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf5b #| fig-width: 6 #| fig-height: 4 dat5b.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat5b #| fig-width: 6 #| fig-height: 4 dat5b.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess5b #| fig-width: 6 #| fig-height: 4 dat5b.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. #### Standardised predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots5c #| fig-width: 6 #| fig-height: 4 dat5c.brm2$fit |> stan_trace() dat5c.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf5c #| fig-width: 6 #| fig-height: 4 dat5c.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat5c #| fig-width: 6 #| fig-height: 4 dat5c.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess5c #| fig-width: 6 #| fig-height: 4 dat5c.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. ::::::: ### Example 6 (Binomial data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots6a #| fig-width: 6 #| fig-height: 4 dat6a.brm2$fit |> stan_trace() dat6a.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf6a #| fig-width: 6 #| fig-height: 4 dat6a.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat6a #| fig-width: 6 #| fig-height: 4 dat6a.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess6a #| fig-width: 6 #| fig-height: 4 dat6a.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. #### Centered predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots6b #| fig-width: 6 #| fig-height: 4 dat6b.brm2$fit |> stan_trace() dat6b.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf6b #| fig-width: 6 #| fig-height: 4 dat6b.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat6b #| fig-width: 6 #| fig-height: 4 dat6b.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess6b #| fig-width: 6 #| fig-height: 4 dat6b.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. #### Standardised predictor ::::: {.panel-tabset} ##### Traceplots Plots of the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each **trace** should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter). ```{r} #| label: modelValidation_traceplots6c #| fig-width: 6 #| fig-height: 4 dat6c.brm2$fit |> stan_trace() dat6c.brm2$fit |> stan_trace(inc_warmup = TRUE) ``` **Conclusions**: - the chains appear well mixed and very similar ##### Autocorrelation plots ```{r} #| label: modelValidation_acf6c #| fig-width: 6 #| fig-height: 4 dat6c.brm2$fit |> stan_ac() ``` **Conclusions**: - there is no evidence of autocorrelation in the MCMC samples ##### Rhat Rhat is a **scale reduction factor** measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated. ```{r} #| label: modelValidation_rhat6c #| fig-width: 6 #| fig-height: 4 dat6c.brm2$fit |> stan_rhat() ``` **Conclusions**: - all Rhat values are below 1.05, suggesting the chains have converged. ##### Effective sample sizes The number of effective samples - the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample). If the ratios are low, tightening the priors may help. ```{r} #| label: modelValidation_ess6c #| fig-width: 6 #| fig-height: 4 dat6c.brm2$fit |> stan_ess() ``` **Conclusions**: - ratios are all very high ::::: **Conclusions**: - all the diagnostics appear reasonable - we can conclude that the chains are all well mixed and have converged on a stable posterior. ::::::: :::: # Model validation Model validation involves exploring the model diagnostics and fit to ensure that the model is broadly appropriate for the data. As such, exploration of the residuals should be routine. For more complex models (those that contain multiple effects, it is also advisable to plot the residuals against each of the individual predictors. For sampling designs that involve sample collection over space or time, it is also a good idea to explore whether there are any temporal or spatial patterns in the residuals. There are numerous situations (e.g. when applying specific variance-covariance structures to a model) where raw residuals do not reflect the interior workings of the model. Typically, this is because they do not take into account the variance-covariance matrix or assume a very simple variance-covariance matrix. Since the purpose of exploring residuals is to evaluate the model, for these cases, it is arguably better to draw conclusions based on standardized (or studentized) residuals. Unfortunately the definitions of standardised and studentised residuals appears to vary and the two terms get used interchangeably. I will adopt the following definitions: Standardized residuals : the raw residuals divided by the **true** standard deviation of the residuals (which of course is rarely known). Studentized residuals : the raw residuals divided by the standard deviation of the residuals. Note that **externally studentised** residuals are calculated by dividing the raw residuals by a unique standard deviation for each observation that is calculated from regressions having left each successive observation out. Pearson residuals : the raw residuals divided by the standard deviation of the response variable. The mark of a good model is being able to predict well. In an ideal world, we would have sufficiently large sample size as to permit us to hold a fraction (such as 25%) back thereby allowing us to train the model on 75% of the data and then see how well the model can predict the withheld 25%. Unfortunately, such a luxury is still rare in ecology. The next best option is to see how well the model can predict the observed data. Models tend to struggle most with the extremes of trends and have particular issues when the extremes approach logical boundaries (such as zero for count data and standard deviations). We can use the fitted model to generate random predicted observations and then explore some properties of these compared to the actual observed data. | Package | Description | function | rstanarm | brms | |-----------|-------------------|------------------------------|-------------------------------------------------------|----------------------------------------------------| | bayesplot | Density overlay | `ppc_dens_overlay` | `pp_check(mod, plotfun='dens_overlay')` | `pp_check(mod, type='dens_overlay')` | | | Obs vs Pred error | `ppc_error_scatter_avg` | `pp_check(mod, plotfun='error_scatter_avg')` | `pp_check(mod, type='error_scatter_avg')` | | | Pred error vs x | `ppc_error_scatter_avg_vs_x` | `pp_check(mod, x=, plotfun='error_scatter_avg_vs_x')` | `pp_check(mod, x=, type='error_scatter_avg_vs_x')` | | | Preds vs x | `ppc_intervals` | `pp_check(mod, x=, plotfun='intervals')` | `pp_check(mod, x=, type='intervals')` | | | Partial plot | `ppc_ribbon` | `pp_check(mod, x=, plotfun='ribbon')` | `pp_check(mod, x=, type='ribbon')` | | | | | | | : {.primary .bordered .sm .paramsTable tbl-colwidths="[10,10, 20, 30, 30]"} :::: {.callout-tip collapse="true"} ## More PPC modules (functions) available ```{r modelValidation_5a, results='markdown', eval=TRUE, mhidden=TRUE, fig.width=6, fig.height=4} available_ppc() ``` :::: :::: {.panel-tabset} ### Example 1 (Gaussian data) ::: {.callout-note} Bayesian samplers involve many calls to randomisation functions. As a result, the estimates will vary slightly each time the routines are run. You should expect that the outputs that you obtain will differ slightly from those that I am displaying. Nevertheless, the main conclusions should remain robust across subsequent runs. ::: ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelValidation_resid1a #| fig-width: 4 #| fig-height: 4 resid <- resid(dat1a.brm2)[, "Estimate"] fit <- fitted(dat1a.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelValidation_resid21a #| fig-width: 4 #| fig-height: 4 resid <- resid(dat1a.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat$x)) ``` **Conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior probability checks Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model. **Density overlay** These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data. ```{r} #| label: modelValidation_ppdensity1a #| fig-width: 4 #| fig-height: 3 dat1a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror1a #| fig-width: 3 #| fig-height: 3 dat1a.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals1a #| fig-width: 4 #| fig-height: 3 dat1a.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon1a #| fig-width: 4 #| fig-height: 3 dat1a.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation5g, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat1a.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa1a #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat1a.resids <- make_brms_dharma_res(dat1a.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat1a.resids)) + wrap_elements(~ plotResiduals(dat1a.resids)) + wrap_elements(~ testDispersion(dat1a.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable #### Centered predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelValidation_resid1b #| fig-width: 4 #| fig-height: 4 resid <- resid(dat1b.brm2)[, "Estimate"] fit <- fitted(dat1b.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelValidation_resid21b #| fig-width: 4 #| fig-height: 4 resid <- resid(dat1b.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat$x)) ``` **Conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior probability checks Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model. **Density overlay** These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data. ```{r} #| label: modelValidation_ppdensity1b #| fig-width: 4 #| fig-height: 3 dat1b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror1b #| fig-width: 3 #| fig-height: 3 dat1b.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals1b #| fig-width: 4 #| fig-height: 3 dat1b.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon1b #| fig-width: 4 #| fig-height: 3 dat1b.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data :::: {.callout-tip collapse="true"} ###### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation5g1b, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat1b.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa1b #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat1b.resids <- make_brms_dharma_res(dat1b.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat1b.resids)) + wrap_elements(~ plotResiduals(dat1b.resids)) + wrap_elements(~ testDispersion(dat1b.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable #### Standardised predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelValidation_resid1c #| fig-width: 4 #| fig-height: 4 resid <- resid(dat1c.brm2)[, "Estimate"] fit <- fitted(dat1c.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelValidation_resid21c #| fig-width: 4 #| fig-height: 4 resid <- resid(dat1c.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat$x)) ``` **Conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior probability checks Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model. **Density overlay** These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data. ```{r} #| label: modelValidation_ppdensity1c #| fig-width: 4 #| fig-height: 3 dat1c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror1c #| fig-width: 3 #| fig-height: 3 dat1c.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals1c #| fig-width: 4 #| fig-height: 3 dat1c.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon1c #| fig-width: 4 #| fig-height: 3 dat1c.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data :::: {.callout-tip collapse="true"} ###### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation5g1c, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat1c.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa1c #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat1c.resids <- make_brms_dharma_res(dat1c.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat1c.resids)) + wrap_elements(~ plotResiduals(dat1c.resids)) + wrap_elements(~ testDispersion(dat1c.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable ::::::: ### Example 2 (categorical predictor) ::::::: {.panel-tabset} #### Treatment contrasts ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelValidation_resid2a #| fig-width: 4 #| fig-height: 4 resid <- resid(dat2a.brm2)[, "Estimate"] fit <- fitted(dat2a.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelValidation_resid22a #| fig-width: 4 #| fig-height: 4 resid <- resid(dat2a.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat2$x)) ``` **Conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior probability checks Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model. **Density overlay** These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data. ```{r} #| label: modelValidation_ppdensity2a #| fig-width: 4 #| fig-height: 3 dat2a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror2a #| fig-width: 3 #| fig-height: 3 dat2a.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. :::: {.callout-tip collapse="true"} ##### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation2a, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat2a.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa2a #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat2a.resids <- make_brms_dharma_res(dat2a.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat2a.resids)) + wrap_elements(~ plotResiduals(dat2a.resids)) + wrap_elements(~ testDispersion(dat2a.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable #### Means parameterisation ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelValidation_resid2b #| fig-width: 4 #| fig-height: 4 resid <- resid(dat2b.brm2)[, "Estimate"] fit <- fitted(dat2b.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelValidation_resid22b #| fig-width: 4 #| fig-height: 4 resid <- resid(dat2b.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat2$x)) ``` **Conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior probability checks Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model. **Density overlay** These are plots of the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data. ```{r} #| label: modelValidation_ppdensity2b #| fig-width: 4 #| fig-height: 3 dat2b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror2b #| fig-width: 3 #| fig-height: 3 dat2b.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. :::: {.callout-tip collapse="true"} ##### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation2b, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat2b.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa2b #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat2b.resids <- make_brms_dharma_res(dat2b.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat2b.resids)) + wrap_elements(~ plotResiduals(dat2b.resids)) + wrap_elements(~ testDispersion(dat2b.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable ::::::: ### Example 3 (Poisson data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelvalidation_resid3a #| fig-width: 4 #| fig-height: 4 resid <- resid(dat3a.brm2)[, "Estimate"] fit <- fitted(dat3a.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelvalidation_resid23a #| fig-width: 4 #| fig-height: 4 resid <- resid(dat3a.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat3$x)) ``` **conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior Probability Checks **density overlay** ```{r} #| label: modelvalidation_ppdensity3a #| fig-width: 4 #| fig-height: 3 dat3a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror3a #| fig-width: 3 #| fig-height: 3 dat3a.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals3a #| fig-width: 4 #| fig-height: 3 dat3a.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon3a #| fig-width: 4 #| fig-height: 3 dat3a.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation3a, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat3a.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa3a #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat3a.resids <- make_brms_dharma_res(dat3a.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat3a.resids)) + wrap_elements(~ plotResiduals(dat3a.resids)) + wrap_elements(~ testDispersion(dat3a.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable #### Centered predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelvalidation_resid3b #| fig-width: 4 #| fig-height: 4 resid <- resid(dat3b.brm2)[, "Estimate"] fit <- fitted(dat3b.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelvalidation_resid23b #| fig-width: 4 #| fig-height: 4 resid <- resid(dat3b.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat3$x)) ``` **conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior Probability Checks **density overlay** ```{r} #| label: modelvalidation_ppdensity3b #| fig-width: 4 #| fig-height: 3 dat3b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror3b #| fig-width: 3 #| fig-height: 3 dat3b.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals3b #| fig-width: 4 #| fig-height: 3 dat3b.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon3b #| fig-width: 4 #| fig-height: 3 dat3b.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation3b, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat3b.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa3b #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat3b.resids <- make_brms_dharma_res(dat3b.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat3b.resids)) + wrap_elements(~ plotResiduals(dat3b.resids)) + wrap_elements(~ testDispersion(dat3b.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable #### Standardised predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelvalidation_resid3c #| fig-width: 4 #| fig-height: 4 resid <- resid(dat3c.brm2)[, "Estimate"] fit <- fitted(dat3c.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelvalidation_resid23c #| fig-width: 4 #| fig-height: 4 resid <- resid(dat3c.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat3$x)) ``` **conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior Probability Checks **density overlay** ```{r} #| label: modelvalidation_ppdensity3c #| fig-width: 4 #| fig-height: 3 dat3c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror3c #| fig-width: 3 #| fig-height: 3 dat3c.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals3c #| fig-width: 4 #| fig-height: 3 dat3c.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon3c #| fig-width: 4 #| fig-height: 3 dat3c.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation3c, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat3c.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa3c #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat3c.resids <- make_brms_dharma_res(dat3c.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat3c.resids)) + wrap_elements(~ plotResiduals(dat3c.resids)) + wrap_elements(~ testDispersion(dat3c.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable ::::::: ### Example 4 (NB data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelvalidation_resid4a #| fig-width: 4 #| fig-height: 4 resid <- resid(dat4a.brm2)[, "Estimate"] fit <- fitted(dat4a.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelvalidation_resid24a #| fig-width: 4 #| fig-height: 4 resid <- resid(dat4a.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat4$x)) ``` **conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior Probability Checks **density overlay** ```{r} #| label: modelvalidation_ppdensity4a #| fig-width: 4 #| fig-height: 3 dat4a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror4a #| fig-width: 3 #| fig-height: 3 dat4a.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals4a #| fig-width: 4 #| fig-height: 3 dat4a.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon4a #| fig-width: 4 #| fig-height: 3 dat4a.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation4a, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat4a.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa4a #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat4a.resids <- make_brms_dharma_res(dat4a.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat4a.resids)) + wrap_elements(~ plotResiduals(dat4a.resids)) + wrap_elements(~ testDispersion(dat4a.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable #### Centered predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelvalidation_resid4b #| fig-width: 4 #| fig-height: 4 resid <- resid(dat4b.brm2)[, "Estimate"] fit <- fitted(dat4b.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelvalidation_resid24b #| fig-width: 4 #| fig-height: 4 resid <- resid(dat4b.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat4$x)) ``` **conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior Probability Checks **density overlay** ```{r} #| label: modelvalidation_ppdensity4b #| fig-width: 4 #| fig-height: 3 dat4b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror4b #| fig-width: 3 #| fig-height: 3 dat4b.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals4b #| fig-width: 4 #| fig-height: 3 dat4b.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon4b #| fig-width: 4 #| fig-height: 3 dat4b.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation4b, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat4b.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa4b #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat4b.resids <- make_brms_dharma_res(dat4b.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat4b.resids)) + wrap_elements(~ plotResiduals(dat4b.resids)) + wrap_elements(~ testDispersion(dat4b.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable #### Standardised predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelvalidation_resid4c #| fig-width: 4 #| fig-height: 4 resid <- resid(dat4c.brm2)[, "Estimate"] fit <- fitted(dat4c.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelvalidation_resid24c #| fig-width: 4 #| fig-height: 4 resid <- resid(dat4c.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat4$x)) ``` **conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior Probability Checks **density overlay** ```{r} #| label: modelvalidation_ppdensity4c #| fig-width: 4 #| fig-height: 3 dat4c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror4c #| fig-width: 3 #| fig-height: 3 dat4c.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals4c #| fig-width: 4 #| fig-height: 3 dat4c.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon4c #| fig-width: 4 #| fig-height: 3 dat4c.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation4c, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat4c.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa4c #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat4c.resids <- make_brms_dharma_res(dat4c.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat4c.resids)) + wrap_elements(~ plotResiduals(dat4c.resids)) + wrap_elements(~ testDispersion(dat4c.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable ::::::: ### Example 5 (Binary data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelvalidation_resid5a #| fig-width: 4 #| fig-height: 4 resid <- resid(dat5a.brm2)[, "Estimate"] fit <- fitted(dat5a.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelvalidation_resid25a #| fig-width: 4 #| fig-height: 4 resid <- resid(dat5a.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat3$x)) ``` **conclusions**: - the above plots are almost impossible to interpret for binary data. - they will always feature two curved lines (one for the zeros, the other for the ones) - it is virtually impossible to diagnose any issues from such plots. ##### Posterior Probability Checks **density overlay** ```{r} #| label: modelvalidation_ppdensity5a #| fig-width: 4 #| fig-height: 3 dat5a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data - note that these density plots are going to be too crude to be completely useful - all the mass should be at either 0 or 1 **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror5a #| fig-width: 3 #| fig-height: 3 dat5a.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. - this sort of plot is of very little value for binary data **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals5a #| fig-width: 4 #| fig-height: 3 dat5a.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data - this sort of plot is of very little value for binary data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon5a #| fig-width: 4 #| fig-height: 3 dat5a.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data - this sort of plot is of very little value for binary data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation5a, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat5a.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data ```{r} #| label: modelValidation_DHARMa5a #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat5a.resids <- make_brms_dharma_res(dat5a.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat5a.resids)) + wrap_elements(~ plotResiduals(dat5a.resids, quantreg = FALSE)) + wrap_elements(~ testDispersion(dat5a.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable #### Centered predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelvalidation_resid5b #| fig-width: 4 #| fig-height: 4 resid <- resid(dat5b.brm2)[, "Estimate"] fit <- fitted(dat5b.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelvalidation_resid25b #| fig-width: 4 #| fig-height: 4 resid <- resid(dat5b.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat3$x)) ``` **conclusions**: - the above plots are almost impossible to interpret for binary data. - they will always feature two curved lines (one for the zeros, the other for the ones) - it is virtually impossible to diagnose any issues from such plots. ##### Posterior Probability Checks **density overlay** ```{r} #| label: modelvalidation_ppdensity5b #| fig-width: 4 #| fig-height: 3 dat5b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data - note that these density plots are going to be too crude to be completely useful - all the mass should be at either 0 or 1 **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror5b #| fig-width: 3 #| fig-height: 3 dat5b.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. - this sort of plot is of very little value for binary data **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals5b #| fig-width: 4 #| fig-height: 3 dat5b.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data - this sort of plot is of very little value for binary data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon5b #| fig-width: 4 #| fig-height: 3 dat5b.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data - this sort of plot is of very little value for binary data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation5b, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat5b.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data ```{r} #| label: modelValidation_DHARMa5b #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat5b.resids <- make_brms_dharma_res(dat5b.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat5b.resids)) + wrap_elements(~ plotResiduals(dat5b.resids, quantreg = FALSE)) + wrap_elements(~ testDispersion(dat5b.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable #### Standaridised predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelvalidation_resid5c #| fig-width: 4 #| fig-height: 4 resid <- resid(dat5c.brm2)[, "Estimate"] fit <- fitted(dat5c.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelvalidation_resid25c #| fig-width: 4 #| fig-height: 4 resid <- resid(dat5c.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat3$x)) ``` **conclusions**: - the above plots are almost impossible to interpret for binary data. - they will always feature two curved lines (one for the zeros, the other for the ones) - it is virtually impossible to diagnose any issues from such plots. ##### Posterior Probability Checks **density overlay** ```{r} #| label: modelvalidation_ppdensity5c #| fig-width: 4 #| fig-height: 3 dat5c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data - note that these density plots are going to be too crude to be completely useful - all the mass should be at either 0 or 1 **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror5c #| fig-width: 3 #| fig-height: 3 dat5c.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. - this sort of plot is of very little value for binary data **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals5c #| fig-width: 4 #| fig-height: 3 dat5c.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data - this sort of plot is of very little value for binary data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon5c #| fig-width: 4 #| fig-height: 3 dat5c.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data - this sort of plot is of very little value for binary data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation5c, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat5c.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation In the code below, I have instructed the residual plot to not apply quantile regression to the residuals due to a lack of unique data ```{r} #| label: modelValidation_DHARMa5c #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat5c.resids <- make_brms_dharma_res(dat5c.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat5c.resids)) + wrap_elements(~ plotResiduals(dat5c.resids, quantreg = FALSE)) + wrap_elements(~ testDispersion(dat5c.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable ::::::: ### Example 6 (Binomial data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelvalidation_resid6a #| fig-width: 4 #| fig-height: 4 resid <- resid(dat6a.brm2)[, "Estimate"] fit <- fitted(dat6a.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelvalidation_resid26a #| fig-width: 4 #| fig-height: 4 resid <- resid(dat6a.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat6$x)) ``` **conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior Probability Checks **density overlay** ```{r} #| label: modelvalidation_ppdensity6a #| fig-width: 4 #| fig-height: 3 dat6a.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror6a #| fig-width: 3 #| fig-height: 3 dat6a.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals6a #| fig-width: 4 #| fig-height: 3 dat6a.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon6a #| fig-width: 4 #| fig-height: 3 dat6a.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation6a, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat6a.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa6a #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat6a.resids <- make_brms_dharma_res(dat6a.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat6a.resids)) + wrap_elements(~ plotResiduals(dat6a.resids)) + wrap_elements(~ testDispersion(dat6a.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable #### Centered predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelvalidation_resid6b #| fig-width: 4 #| fig-height: 4 resid <- resid(dat6b.brm2)[, "Estimate"] fit <- fitted(dat6b.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelvalidation_resid26b #| fig-width: 4 #| fig-height: 4 resid <- resid(dat6b.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat6$x)) ``` **conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior Probability Checks **density overlay** ```{r} #| label: modelvalidation_ppdensity6b #| fig-width: 4 #| fig-height: 3 dat6b.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror6b #| fig-width: 3 #| fig-height: 3 dat6b.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals6b #| fig-width: 4 #| fig-height: 3 dat6b.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon6b #| fig-width: 4 #| fig-height: 3 dat6b.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation6b, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat6b.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa6b #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat6b.resids <- make_brms_dharma_res(dat6b.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat6b.resids)) + wrap_elements(~ plotResiduals(dat6b.resids)) + wrap_elements(~ testDispersion(dat6b.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable #### Standardised predictor ::::: {.panel-tabset} ##### Simple residual plots ```{r} #| label: modelvalidation_resid6c #| fig-width: 4 #| fig-height: 4 resid <- resid(dat6c.brm2)[, "Estimate"] fit <- fitted(dat6c.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit)) ``` We should also plot the residuals against each of the predictor variables (as well as any other important unmodelled predictors - particularly time and space if relevant and available). ```{r} #| label: modelvalidation_resid26c #| fig-width: 4 #| fig-height: 4 resid <- resid(dat6c.brm2)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = dat6$x)) ``` **conclusions**: - there does not appear to be any pattern in the residuals ##### Posterior Probability Checks **density overlay** ```{r} #| label: modelvalidation_ppdensity6c #| fig-width: 4 #| fig-height: 3 dat6c.brm2 |> pp_check(type = 'dens_overlay', ndraws = 100) ``` **Conclusions**: - the model draws appear to be consistent with the observed data **Error scatter** These are plots of the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. ```{r} #| label: modelValidation_pperror6c #| fig-width: 3 #| fig-height: 3 dat6c.brm2 |> pp_check(type = 'error_scatter_avg') ``` **Conclusions**: - there is no obvious pattern in the residuals. **Intervals** These are plots of the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals. ```{r} #| label: modelValidation_ppintervals6c #| fig-width: 6 #| fig-height: 3 dat6c.brm2 |> pp_check(type = 'intervals', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data **Ribbon** These are just an alternative way of expressing the interval plot. ```{r} #| label: modelValidation_ppribbon6c #| fig-width: 4 #| fig-height: 3 dat6c.brm2 |> pp_check(type = 'ribbon', x = "x") ``` **Conclusions**: - the posterior predictions are not inconsistent with the observed data :::: {.callout-tip collapse="true"} ####### Using `shiny` to explore a wide range of MCMC and validation diagnostics The `shinystan` package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface. ```{r modelValidation6c, results='markdown', eval=FALSE, mhidden=TRUE, fig.width=6, fig.height=4} library(shinystan) launch_shinystan(dat6c.brm2) ``` :::: ##### DHARMa (simulated) residuals DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the `simulateResiduals()` function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model. We need to supply: - simulated (predicted) responses associated with each observation. - observed values - fitted (predicted) responses (averaged) associated with each observation ```{r} #| label: modelValidation_DHARMa6c #| fig-width: 15 #| fig-height: 5 #| out-width: 8in #| cache: true dat6c.resids <- make_brms_dharma_res(dat6c.brm2, integerResponse = FALSE) wrap_elements(~ testUniformity(dat6c.resids)) + wrap_elements(~ plotResiduals(dat6c.resids)) + wrap_elements(~ testDispersion(dat6c.resids)) + plot_layout(nrow = 1) ``` :::: {.callout-note} If you are using Rstudio (particularly with a quarto document), the above code may produce a graphic that is too large for the currently available device. The document should still render without problems, it is just the live display within Rstudio that cannot accommodate the figure. To address this, you can either: - break the multi-panel figure up into four separate figures by removing the outer `wrap_elements()` and `plot_layout()` functions. - copy the above code into the console and view in a larger graphics device :::: **Conclusions**: - the Q-Q plot looks reasonable (points broadly follow the angled line) - there are no flagged issues with the - KS test: conformity to the nominated distribution (family) - Dispersion test: would not normally expect this to be an issue for a Gaussian family unless there are other issues with the residuals - Outlier test: the influence of each observation - there does not appear to be any patterns in the residuals - each of the three quantile trends are considered flat and centered around 1/3, 1/2 and 2/3 - the observed dispersion is well within the simulated range indicating that there is no issue with dispersion (again this is expected for a Gaussian model) ::::: **Conclusions**: - there is no evidence of a lack of fit - the model is likely to be reliable ::::::: :::: # Partial effects plots Prior to exploring the modelled numerical estimates, it is worth reviewing simple plots of the predicted trends associated with each predictor. Importantly, they typically express the trends on the scale of the response, although for some, it is possible to force the trends to be expressed on the link scale. Such plots provides a final visual check of whether the model has yielded sensible outcomes. Furthermore, they usually assist in the interpretation of the major estimated parameters. :::: {.panel-tabset} ### Example 1 (Gaussian data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots1a #| fig-width: 4 #| fig-height: 4 dat1a.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat1a.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` ::::: #### Centered predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots1b #| fig-width: 4 #| fig-height: 4 dat1b.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat1b.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` Notice that although we had centered the predictor, because we did so in the model formula, `conditional_effects` is able to backtransform $x$ onto the original scale when producing the partial plot. ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots1c #| fig-width: 4 #| fig-height: 4 dat1c.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat1c.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` Notice that although we had centered the predictor, because we did so in the model formula, `conditional_effects` is able to backtransform $x$ onto the original scale when producing the partial plot. ::::: ::::::: ### Example 2 (categorical predictor) ::::::: {.panel-tabset} #### Treatment contrasts ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots2a #| fig-width: 4 #| fig-height: 4 dat2a.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat2a.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` ::::: #### Means parameterisation ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots2b #| fig-width: 4 #| fig-height: 4 dat2b.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat2b.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` ::::: ::::::: ### Example 3 (Poisson data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots3a #| fig-width: 4 #| fig-height: 4 dat3a.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat3a.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` ::::: #### Centered predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots3b #| fig-width: 4 #| fig-height: 4 dat3b.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat3b.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` Notice that although we had centered the predictor, because we did so in the model formula, `conditional_effects` is able to backtransform $x$ onto the original scale when producing the partial plot. ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots3c #| fig-width: 4 #| fig-height: 4 dat3c.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat3c.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` Notice that although we had centered the predictor, because we did so in the model formula, `conditional_effects` is able to backtransform $x$ onto the original scale when producing the partial plot. ::::: ::::::: ### Example 4 (NB data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots4a #| fig-width: 4 #| fig-height: 4 dat4a.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat4a.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` ::::: #### Centered predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots4b #| fig-width: 4 #| fig-height: 4 dat4b.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat4b.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` Notice that although we had centered the predictor, because we did so in the model formula, `conditional_effects` is able to backtransform $x$ onto the original scale when producing the partial plot. ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots4c #| fig-width: 4 #| fig-height: 4 dat4c.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat4c.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` Notice that although we had centered the predictor, because we did so in the model formula, `conditional_effects` is able to backtransform $x$ onto the original scale when producing the partial plot. ::::: ::::::: ### Example 5 (Binary data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots5a #| fig-width: 4 #| fig-height: 4 dat5a.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat5a.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` ::::: #### Centered predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots5b #| fig-width: 4 #| fig-height: 4 dat5b.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat5b.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` Notice that although we had centered the predictor, because we did so in the model formula, `conditional_effects` is able to backtransform $x$ onto the original scale when producing the partial plot. ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots5c #| fig-width: 4 #| fig-height: 4 dat5c.brm2 |> conditional_effects() |> plot(points = TRUE) #OR dat5c.brm2 |> conditional_effects(spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` Notice that although we had centered the predictor, because we did so in the model formula, `conditional_effects` is able to backtransform $x$ onto the original scale when producing the partial plot. ::::: ::::::: ### Example 6 (Binomial data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots6a #| fig-width: 4 #| fig-height: 4 dat6a.brm2 |> conditional_effects(conditions = data.frame(total = dat6$total)) |> plot(points = TRUE) #OR dat6a.brm2 |> conditional_effects(conditions = data.frame(total = dat6$total), spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` ::::: #### Centered predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots6b #| fig-width: 4 #| fig-height: 4 dat6b.brm2 |> conditional_effects(conditions = data.frame(total = dat6$total)) |> plot(points = TRUE) #OR dat6b.brm2 |> conditional_effects(conditions = data.frame(total = dat6$total), spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` Notice that although we had centered the predictor, because we did so in the model formula, `conditional_effects` is able to backtransform $x$ onto the original scale when producing the partial plot. ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Conditional effects ```{r} #| label: partial_plots6c #| fig-width: 4 #| fig-height: 4 dat6c.brm2 |> conditional_effects(conditions = data.frame(total = dat6$total)) |> plot(points = TRUE) #OR dat6c.brm2 |> conditional_effects(conditions = data.frame(total = dat6$total), spaghetti = TRUE, ndraws = 200) |> plot(points = TRUE) ``` Notice that although we had centered and scaled the predictor, because we did so in the model formula, `conditional_effects` is able to backtransform $x$ onto the original scale when producing the partial plot. ::::: ::::::: :::: # Model investigation Rather than simply return point estimates of each of the model parameters, Bayesian analyses capture the full posterior of each parameter. These are typically stored within the `list` structure of the output object. As with most statistical routines, the overloaded `summary()` function provides an overall summary of the model parameters. Typically, the summaries will include the means / medians along with credibility intervals and perhaps convergence diagnostics (such as R hat). However, more thorough investigation and analysis of the parameter posteriors requires access to the full posteriors. There is currently a plethora of functions for extracting the full posteriors from models. In part, this is a reflection of a rapidly evolving space with numerous packages providing near equivalent functionality (it should also be noted, that over time, many of the functions have been deprecated due to inconsistencies in their names). Broadly speaking, the functions focus on draws from the posterior of either the parameters (intercept, slope, standard deviation etc), **linear predictor**, **expected values** or **predicted values**. The distinction between the latter three are highlighted in the following table. | Property | Description | |-------------------|----------------------------------------------------------------------------------------------| | linear predictors | values predicted on the link scale | | expected values | predictions (on response scale) without residual error (predicting expected mean outcome(s)) | | predicted values | predictions (on response scale) that incorporate residual error | | fitted values | predictions on the response scale | : {.primary .bordered .sm .paramsTable tbl-colwidths="[30, 70]"} :::: {.panel-tabset} ### Example 1 (Gaussian data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary1a dat1a.brm2 |> summary() ``` ```{r} #| label: summaryF1a #| echo: false dat1a.fixef <- fixef(dat1a.brm2) dat1a.sum <- summary(dat1a.brm2) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$, the expected value of $y$ is `r round(dat1a.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat1a.fixef[1,3], 3)` and `r round(dat1a.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat1a.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat1a.fixef[2,3], 3)` and `r round(dat1a.fixef[2,4], 3)` - sigma is estimated to be `r round(dat1a.sum$spec_pars[1,1],2)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws1a dat1a.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws1a2 #| echo: false dat1a.summ_draws <- dat1a.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(dat1a.summ_draws[1,2], 3)` and we are 95% confident that the true value is between `r round(dat1a.summ_draws[1,3], 3)` and `r round(dat1a.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat1a.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat1a.summ_draws[2,3], 3)` and `r round(dat1a.summ_draws[2,4], 3)` - sigma is estimated to be `r round(dat1a.summ_draws[3,2],2)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" ```{r} #| label: as_draws1a dat1a.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(dat1a.summ_draws[1,2], 3)` and we are 95% confident that the true value is between `r round(dat1a.summ_draws[1,3], 3)` and `r round(dat1a.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat1a.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat1a.summ_draws[2,3], 3)` and `r round(dat1a.summ_draws[2,4], 3)` - sigma is estimated to be `r round(dat1a.summ_draws[3,2],2)` ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. ```{r} #| label: gather_draws1a #| fig-width: 8 #| fig-height: 3 dat1a.brm2 |> gather_draws(b_Intercept, b_x, sigma) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws1a2 #| fig-width: 5 #| fig-height: 3 dat1a.brm2 |> gather_draws(b_Intercept, b_x, sigma) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R21a dat1a.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R21a2 #| echo: false dat1a.r2 <- dat1a.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat1a.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat1a.r2[1,2], 3)`% and `r round(100*dat1a.r2[1,3], 3)`% ::::: #### Centered predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary1b dat1b.brm2 |> summary() ``` ```{r} #| label: summaryF1b #| echo: false dat1b.fixef <- fixef(dat1b.brm2) dat1b.sum <- summary(dat1b.brm2) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$ (its average since it is centered), the expected value of $y$ is `r round(dat1b.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat1b.fixef[1,3], 3)` and `r round(dat1b.fixef[1,4], 3)`. So $y$ is expected to be `r round(dat1b.fixef[1,1], 3)` at the average $x$. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat1b.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat1b.fixef[2,3], 3)` and `r round(dat1b.fixef[2,4], 3)` - sigma is estimated to be `r round(dat1b.sum$spec_pars[1,1],2)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws1b dat1b.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws1b2 #| echo: false dat1b.summ_draws <- dat1b.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$ (its average since it is centered), the expected value of $y$ is `r round(dat1b.summ_draws[1,2], 3)` and we are 95% confident that the true value is between `r round(dat1b.summ_draws[1,3], 3)` and `r round(dat1b.summ_draws[1,4], 3)`. So $y$ is expected to be `r round(dat1b.fixef[1,1], 3)` at the average $x$. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat1b.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat1b.summ_draws[2,3], 3)` and `r round(dat1b.summ_draws[2,4], 3)` - sigma is estimated to be `r round(dat1b.summ_draws[3,2],2)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" ```{r} #| label: as_draws1b dat1b.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$ (its average since it is centered), the expected value of $y$ is `r round(dat1b.summ_draws[1,2], 3)` and we are 95% confident that the true value is between `r round(dat1b.summ_draws[1,3], 3)` and `r round(dat1b.summ_draws[1,4], 3)`. So $y$ is expected to be `r round(dat1b.fixef[1,1], 3)` at the average $x$. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat1b.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat1b.summ_draws[2,3], 3)` and `r round(dat1b.summ_draws[2,4], 3)` - sigma is estimated to be `r round(dat1b.summ_draws[3,2],2)` ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when $x$ is centered, so it is more convenient to refer to this parameter via a regular expression. ```{r} #| label: gather_draws1b #| fig-width: 8 #| fig-height: 3 dat1b.brm2 |> get_variables() dat1b.brm2 |> gather_draws(`b_Intercept`,`b_.*x.*`, `sigma`, regex = TRUE) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws1b2 #| fig-width: 5 #| fig-height: 3 dat1b.brm2 |> gather_draws(`b_Intercept`,`b_.*x.*`, `sigma`, regex = TRUE) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R21b dat1b.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R21b2 #| echo: false dat1b.r2 <- dat1b.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat1b.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat1b.r2[1,2], 3)`% and `r round(100*dat1b.r2[1,3], 3)`% ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary1c dat1c.brm2 |> summary() ``` ```{r} #| label: summaryF1c #| echo: false dat1c.fixef <- fixef(dat1c.brm2) dat1c.sum <- summary(dat1c.brm2) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(dat1c.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat1c.fixef[1,3], 3)` and `r round(dat1c.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) `r round(dat1c.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat1c.fixef[2,3], 3)` and `r round(dat1c.fixef[2,4], 3)` - sigma is estimated to be `r round(dat1c.sum$spec_pars[1,1],2)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws1c dat1c.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws1c2 #| echo: false dat1c.summ_draws <- dat1c.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(dat1c.summ_draws[1,2], 3)` and we are 95% confident that the true value is between `r round(dat1c.summ_draws[1,3], 3)` and `r round(dat1c.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average)`r round(dat1c.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat1c.summ_draws[2,3], 3)` and `r round(dat1c.summ_draws[2,4], 3)` - sigma is estimated to be `r round(dat1c.summ_draws[3,2],2)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" ```{r} #| label: as_draws1c dat1c.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(dat1c.summ_draws[1,2], 3)` and we are 95% confident that the true value is between `r round(dat1c.summ_draws[1,3], 3)` and `r round(dat1c.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) `r round(dat1c.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat1c.summ_draws[2,3], 3)` and `r round(dat1c.summ_draws[2,4], 3)` - sigma is estimated to be `r round(dat1c.summ_draws[3,2],2)` ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when $x$ is centered, so it is more convenient to refer to this parameter via a regular expression. ```{r} #| label: gather_draws1c #| fig-width: 8 #| fig-height: 3 dat1c.brm2 |> get_variables() dat1c.brm2 |> gather_draws(`b_Intercept`,`b_.*x`, `sigma`, regex = TRUE) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws1c2 #| fig-width: 5 #| fig-height: 3 dat1c.brm2 |> gather_draws(`b_Intercept`,`b_.*x`, `sigma`, regex = TRUE) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R21c dat1c.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R21c2 #| echo: false dat1c.r2 <- dat1c.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat1c.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat1c.r2[1,2], 3)`% and `r round(100*dat1c.r2[1,3], 3)`% ::::: ::::::: ### Example 2 (categorical predictor) ::::::: {.panel-tabset} #### Treatment contrasts ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary2a dat2a.brm2 |> summary() ``` ```{r} #| label: summaryF2a #| echo: false dat2a.fixef <- fixef(dat2a.brm2) dat2a.sum <- summary(dat2a.brm2) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat2a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x$ is "control", the expected value of $y$ is `r round(dat2a.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat2a.fixef[1,3], 3)` and `r round(dat2a.fixef[1,4], 3)`. - `x*`: (the slopes) - the change (effect) in $y$ between the first (control) group unit (=1) and each other $x$ level. - `xmedium`: $y$ is (on average) `r abs(round(dat2a.fixef[ 2,1], 3))` units (95% confident that this change is between `r round(dat2a.fixef[2,3], 3)` and `r round(dat2a.fixef[2,4], 3)` less in the "medium" group compared to the "control" group. - `xhigh`: $y$ is (on average) `r abs(round(dat2a.fixef[ 3,1], 3))` units and we are 95% confident that this change is between `r round(dat2a.fixef[3,3], 3)` and `r round(dat2a.fixef[3,4], 3)` less in the "high" group compared to the "control" group. - sigma is estimated to be `r round(dat2a.sum$spec_pars[1,1],2)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws2a dat2a.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws2a2 #| echo: false dat2a.summ_draws <- dat2a.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat2a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x$ is "control", the expected value of $y$ is `r round(dat2a.summ_draws[1,2], 3)` and we are 95% confident that the true value is between `r round(dat2a.summ_draws[1,3], 3)` and `r round(dat2a.summ_draws[1,4], 3)`. - `x*`: (the slopes) - the change (effect) in $y$ between the first (control) group unit (=1) and each other $x$ level. - `xmedium`: $y$ is (on average) `r abs(round(dat2a.summ_draws[ 2,2], 3))` units (95% confident that this change is between `r round(dat2a.summ_draws[2,3], 3)` and `r round(dat2a.summ_draws[2,4], 3)` less in the "medium" group compared to the "control" group. - `xhigh`: $y$ is (on average) `r abs(round(dat2a.summ_draws[ 3,2], 3))` units and we are 95% confident that this change is between `r round(dat2a.summ_draws[3,3], 3)` and `r round(dat2a.summ_draws[3,4], 3)` less in the "high" group compared to the "control" group. - sigma is estimated to be `r round(dat2a.sum$spec_pars[1,1],2)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" ```{r} #| label: as_draws2a dat2a.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat2a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x$ is "control", the expected value of $y$ is `r round(dat2a.summ_draws[1,2], 3)` and we are 95% confident that the true value is between `r round(dat2a.summ_draws[1,3], 3)` and `r round(dat2a.summ_draws[1,4], 3)`. - `x*`: (the slopes) - the change (effect) in $y$ between the first (control) group unit (=1) and each other $x$ level. - `xmedium`: $y$ is (on average) `r abs(round(dat2a.summ_draws[ 2,2], 3))` units (95% confident that this change is between `r round(dat2a.summ_draws[2,3], 3)` and `r round(dat2a.summ_draws[2,4], 3)` less in the "medium" group compared to the "control" group. - `xhigh`: $y$ is (on average) `r abs(round(dat2a.summ_draws[ 3,2], 3))` units and we are 95% confident that this change is between `r round(dat2a.summ_draws[3,3], 3)` and `r round(dat2a.summ_draws[3,4], 3)` less in the "high" group compared to the "control" group. - sigma is estimated to be `r round(dat2a.sum$spec_pars[1,1],2)` ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. ```{r} #| label: gather_draws2a #| fig-width: 8 #| fig-height: 3 dat2a.brm2 |> get_variables() dat2a.brm2 |> gather_draws(`b_Intercept`, `b_x.*`, `sigma`, regex = TRUE) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws2a2 #| fig-width: 5 #| fig-height: 3 dat2a.brm2 |> gather_draws(`b_x.*`, regex = TRUE) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R22a dat2a.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R22a2 #| echo: false dat2a.r2 <- dat2a.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat2a.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat2a.r2[1,2], 3)`% and `r round(100*dat2a.r2[1,3], 3)`% ::::: #### Means parameterisation ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary2b dat2b.brm2 |> summary() ``` ```{r} #| label: summaryF2b #| echo: false dat2b.fixef <- fixef(dat2b.brm2) dat2b.sum <- summary(dat2b.brm2) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat2b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `x*`: (the group means). - `xcontrol`: the expected value of $y$ in the "control" group is `r round(dat2b.fixef[ 1,2], 3)` (95% credibility interval is between `r round(dat2b.fixef[1,3], 3)` and `r round(dat2b.fixef[1,4], 3)`) - `xmedium`: the expected value of $y$ in the "control" group is `r round(dat2b.fixef[ 2,2], 3)` (95% credibility interval is between `r round(dat2b.fixef[2,3], 3)` and `r round(dat2b.fixef[2,4], 3)`) - `xhigh`: the expected value of $y$ in the "control" group is `r round(dat2b.fixef[ 3,2], 3)` (95% credibility interval is between `r round(dat2b.fixef[3,3], 3)` and `r round(dat2b.fixef[3,4], 3)`) - sigma is estimated to be `r round(dat2b.sum$spec_pars[1,1],2)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws2b dat2b.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws2b2 #| echo: false dat2b.summ_draws <- dat2b.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat2b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `x*`: (the means of each group) - `xcontrol`: the expected value of $y$ in the "control" group is `r round(dat2b.summ_draws[ 1,2], 3)` (95% credibility interval is between `r round(dat2b.summ_draws[1,3], 3)` and `r round(dat2b.summ_draws[1,4], 3)`) - `xmedium`: the expected value of $y$ in the "control" group is `r round(dat2b.summ_draws[ 2,2], 3)` (95% credibility interval is between `r round(dat2b.summ_draws[2,3], 3)` and `r round(dat2b.summ_draws[2,4], 3)`) - `xhigh`: the expected value of $y$ in the "control" group is `r round(dat2b.summ_draws[ 3,2], 3)` (95% credibility interval is between `r round(dat2b.summ_draws[3,3], 3)` and `r round(dat2b.summ_draws[3,4], 3)`) - sigma is estimated to be `r round(dat2b.sum$spec_pars[1,1],2)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" ```{r} #| label: as_draws2b dat2b.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat2b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `x*`: (the means of each group) - `xcontrol`: the expected value of $y$ in the "control" group is `r round(dat2b.summ_draws[ 1,2], 3)` (95% credibility interval is between `r round(dat2b.summ_draws[1,3], 3)` and `r round(dat2b.summ_draws[1,4], 3)`) - `xmedium`: the expected value of $y$ in the "control" group is `r round(dat2b.summ_draws[ 2,2], 3)` (95% credibility interval is between `r round(dat2b.summ_draws[2,3], 3)` and `r round(dat2b.summ_draws[2,4], 3)`) - `xhigh`: the expected value of $y$ in the "control" group is `r round(dat2b.summ_draws[ 3,2], 3)` (95% credibility interval is between `r round(dat2b.summ_draws[3,3], 3)` and `r round(dat2b.summ_draws[3,4], 3)`) - sigma is estimated to be `r round(dat2b.sum$spec_pars[1,1],2)` ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. ```{r} #| label: gather_draws2b #| fig-width: 8 #| fig-height: 3 dat2b.brm2 |> get_variables() dat2b.brm2 |> gather_draws(`b_x.*`, `sigma`, regex = TRUE) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws2b2 #| fig-width: 5 #| fig-height: 3 dat2b.brm2 |> gather_draws(`b_x.*`, regex = TRUE) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R22b dat2b.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R22b2 #| echo: false dat2b.r2 <- dat2b.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat2b.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat2b.r2[1,2], 3)`% and `r round(100*dat2b.r2[1,3], 3)`% ::::: ::::::: ### Example 3 (Poisson data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary3a dat3a.brm2 |> summary() ``` ```{r} #| label: summaryF3a #| echo: false dat3a.fixef <- fixef(dat3a.brm2) dat3a.sum <- summary(dat3a.brm2) ``` ::: {.callout-important} The results presented by the `summary()` function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat3a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$, the expected value of $y$ is `r round(dat3a.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat3a.fixef[1,3], 3)` and `r round(dat3a.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat3a.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat3a.fixef[2,3], 3)` and `r round(dat3a.fixef[2,4], 3)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws3a dat3a.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws3a2 #| echo: false dat3a.summ_draws <- dat3a.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` ::: {.callout-important} The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat3a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(dat3a.summ_draws[3,2], 3)` and we are 95% confident that the true value is between `r round(dat3a.summ_draws[1,3], 3)` and `r round(dat3a.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat3a.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat3a.summ_draws[2,3], 3)` and `r round(dat3a.summ_draws[2,4], 3)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" It will also use `mutate()` to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising. ```{r} #| label: as_draws3a dat3a.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> mutate(across(everything(), exp)) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat3a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(exp(dat3a.summ_draws[1,2]), 3)` and we are 95% confident that the true value is between `r round(exp(dat3a.summ_draws[1,3]), 3)` and `r round(exp(dat3a.summ_draws[1,4]), 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of `r round(exp(dat3a.summ_draws[ 2,2]), 3)` and we are 95% confident that this change is between `r round(exp(dat3a.summ_draws[2,3]), 3)` and `r round(exp(dat3a.summ_draws[2,4]), 3)`. This represents a (value -1) * 100 `r round((exp(dat3a.summ_draws[ 2,2]) - 1) * 100, 1)` % increase in $y$ per unit increase in $x$. ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. ```{r} #| label: gather_draws3a #| fig-width: 8 #| fig-height: 3 dat3a.brm2 |> gather_draws(b_Intercept, b_x) |> mutate(across(everything(), exp)) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws3a2 #| fig-width: 5 #| fig-height: 3 dat3a.brm2 |> gather_draws(b_Intercept, b_x) |> mutate(across(everything(), exp)) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R23a dat3a.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R23a2 #| echo: false dat3a.r2 <- dat3a.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat3a.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat3a.r2[1,2], 3)`% and `r round(100*dat3a.r2[1,3], 3)`% ::::: #### Centered predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary3b dat3b.brm2 |> summary() ``` ```{r} #| label: summaryF3b #| echo: false dat3b.fixef <- fixef(dat3b.brm2) dat3b.sum <- summary(dat3b.brm2) ``` ::: {.callout-important} The results presented by the `summary()` function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat3b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$, the expected value of $y$ is `r round(dat3b.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat3b.fixef[1,3], 3)` and `r round(dat3b.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat3b.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat3b.fixef[2,3], 3)` and `r round(dat3b.fixef[2,4], 3)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws3b dat3b.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws3b2 #| echo: false dat3b.summ_draws <- dat3b.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` ::: {.callout-important} The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat3b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(dat3b.summ_draws[3,2], 3)` and we are 95% confident that the true value is between `r round(dat3b.summ_draws[1,3], 3)` and `r round(dat3b.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat3b.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat3b.summ_draws[2,3], 3)` and `r round(dat3b.summ_draws[2,4], 3)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" It will also use `mutate()` to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising. ```{r} #| label: as_draws3b dat3b.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> mutate(across(everything(), exp)) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat3b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(exp(dat3b.summ_draws[1,2]), 3)` and we are 95% confident that the true value is between `r round(exp(dat3b.summ_draws[1,3]), 3)` and `r round(exp(dat3b.summ_draws[1,4]), 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of `r round(exp(dat3b.summ_draws[ 2,2]), 3)` and we are 95% confident that this change is between `r round(exp(dat3b.summ_draws[2,3]), 3)` and `r round(exp(dat3b.summ_draws[2,4]), 3)`. This represents a (value -1) * 100 `r round((exp(dat3b.summ_draws[ 2,2]) - 1) * 100, 1)` % increase in $y$ per unit increase in $x$. ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. ```{r} #| label: gather_draws3b #| fig-width: 8 #| fig-height: 3 dat3b.brm2 |> gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws3b2 #| fig-width: 5 #| fig-height: 3 dat3b.brm2 |> gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R23b dat3b.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R23b2 #| echo: false dat3b.r2 <- dat3b.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat3b.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat3b.r2[1,2], 3)`% and `r round(100*dat3b.r2[1,3], 3)`% ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary3c dat3c.brm2 |> summary() ``` ```{r} #| label: summaryF3c #| echo: false dat3c.fixef <- fixef(dat3c.brm2) dat3c.sum <- summary(dat3c.brm2) ``` ::: {.callout-important} The results presented by the `summary()` function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(dat3c.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat3c.fixef[1,3], 3)` and `r round(dat3c.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) `r round(dat3c.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat3c.fixef[2,3], 3)` and `r round(dat3c.fixef[2,4], 3)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws3c dat3c.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws3c2 #| echo: false dat3c.summ_draws <- dat3c.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat3c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(dat3c.summ_draws[1,2], 3)` and we are 95% confident that the true value is between `r round(dat3c.summ_draws[1,3], 3)` and `r round(dat3c.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average)`r round(dat3c.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat3c.summ_draws[2,3], 3)` and `r round(dat3c.summ_draws[2,4], 3)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" It will also use `mutate()` to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising. ```{r} #| label: as_draws3c dat3c.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> mutate(across(everything(), exp)) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat3c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(exp(dat3c.summ_draws[1,2]), 3)` and we are 95% confident that the true value is between `r round(exp(dat3c.summ_draws[1,3]), 3)` and `r round(exp(dat3c.summ_draws[1,4]), 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) a factor of `r round(exp(dat3c.summ_draws[ 2,2]), 3)` units and we are 95% confident that this change is between `r round(exp(dat3c.summ_draws[2,3]), 3)` and `r round(exp(dat3c.summ_draws[2,4]), 3)`. This represents a ((value -1) * 100) `r round((exp(dat3c.summ_draws[ 2,2]) - 1) * 100, 1)`% increase in $y$ per unit increase in $x$. ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when $x$ is centered, so it is more convenient to refer to this parameter via a regular expression. ```{r} #| label: gather_draws3c #| fig-width: 8 #| fig-height: 3 dat3c.brm2 |> get_variables() dat3c.brm2 |> gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws3c2 #| fig-width: 5 #| fig-height: 3 dat3c.brm2 |> gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R23c dat3c.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R23c2 #| echo: false dat3c.r2 <- dat3c.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat3c.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat3c.r2[1,2], 3)`% and `r round(100*dat3c.r2[1,3], 3)`% ::::: ::::::: ### Example 4 (NB data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary4a dat4a.brm2 |> summary() ``` ```{r} #| label: summaryF4a #| echo: false dat4a.fixef <- fixef(dat4a.brm2) dat4a.sum <- summary(dat4a.brm2) ``` ::: {.callout-important} The results presented by the `summary()` function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat4a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$, the expected value of $y$ is `r round(dat4a.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat4a.fixef[1,3], 3)` and `r round(dat4a.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat4a.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat4a.fixef[2,3], 3)` and `r round(dat4a.fixef[2,4], 3)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws4a dat4a.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws4a2 #| echo: false dat4a.summ_draws <- dat4a.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` ::: {.callout-important} The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat4a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(dat4a.summ_draws[3,2], 3)` and we are 95% confident that the true value is between `r round(dat4a.summ_draws[1,3], 3)` and `r round(dat4a.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat4a.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat4a.summ_draws[2,3], 3)` and `r round(dat4a.summ_draws[2,4], 3)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" It will also use `mutate()` to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising. ```{r} #| label: as_draws4a dat4a.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> mutate(across(everything(), exp)) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat4a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(exp(dat4a.summ_draws[1,2]), 3)` and we are 95% confident that the true value is between `r round(exp(dat4a.summ_draws[1,3]), 3)` and `r round(exp(dat4a.summ_draws[1,4]), 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of `r round(exp(dat4a.summ_draws[ 2,2]), 3)` and we are 95% confident that this change is between `r round(exp(dat4a.summ_draws[2,3]), 3)` and `r round(exp(dat4a.summ_draws[2,4]), 3)`. This represents a (value -1) * 100 `r round((exp(dat4a.summ_draws[ 2,2]) - 1) * 100, 1)` % increase in $y$ per unit increase in $x$. ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. ```{r} #| label: gather_draws4a #| fig-width: 8 #| fig-height: 3 dat4a.brm2 |> gather_draws(b_Intercept, b_x) |> mutate(across(everything(), exp)) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws4a2 #| fig-width: 5 #| fig-height: 3 dat4a.brm2 |> gather_draws(b_Intercept, b_x) |> mutate(across(everything(), exp)) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R24a dat4a.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R24a2 #| echo: false dat4a.r2 <- dat4a.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat4a.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat4a.r2[1,2], 3)`% and `r round(100*dat4a.r2[1,3], 3)`% ::::: #### Centered predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary4b dat4b.brm2 |> summary() ``` ```{r} #| label: summaryF4b #| echo: false dat4b.fixef <- fixef(dat4b.brm2) dat4b.sum <- summary(dat4b.brm2) ``` ::: {.callout-important} The results presented by the `summary()` function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat4b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$, the expected value of $y$ is `r round(dat4b.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat4b.fixef[1,3], 3)` and `r round(dat4b.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat4b.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat4b.fixef[2,3], 3)` and `r round(dat4b.fixef[2,4], 3)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws4b dat4b.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws4b2 #| echo: false dat4b.summ_draws <- dat4b.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` ::: {.callout-important} The results presented by the above function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat4b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(dat4b.summ_draws[3,2], 3)` and we are 95% confident that the true value is between `r round(dat4b.summ_draws[1,3], 3)` and `r round(dat4b.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat4b.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat4b.summ_draws[2,3], 3)` and `r round(dat4b.summ_draws[2,4], 3)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" It will also use `mutate()` to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising. ```{r} #| label: as_draws4b dat4b.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> mutate(across(everything(), exp)) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat4b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(exp(dat4b.summ_draws[1,2]), 3)` and we are 95% confident that the true value is between `r round(exp(dat4b.summ_draws[1,3]), 3)` and `r round(exp(dat4b.summ_draws[1,4]), 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of `r round(exp(dat4b.summ_draws[ 2,2]), 3)` and we are 95% confident that this change is between `r round(exp(dat4b.summ_draws[2,3]), 3)` and `r round(exp(dat4b.summ_draws[2,4]), 3)`. This represents a (value -1) * 100 `r round((exp(dat4b.summ_draws[ 2,2]) - 1) * 100, 1)` % increase in $y$ per unit increase in $x$. ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. ```{r} #| label: gather_draws4b #| fig-width: 8 #| fig-height: 3 dat4b.brm2 |> gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws4b2 #| fig-width: 5 #| fig-height: 3 dat4b.brm2 |> gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R24b dat4b.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R24b2 #| echo: false dat4b.r2 <- dat4b.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat4b.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat4b.r2[1,2], 3)`% and `r round(100*dat4b.r2[1,3], 3)`% ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary4c dat4c.brm2 |> summary() ``` ```{r} #| label: summaryF4c #| echo: false dat4c.fixef <- fixef(dat4c.brm2) dat4c.sum <- summary(dat4c.brm2) ``` ::: {.callout-important} The results presented by the `summary()` function on the link scale (in this case, the log scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(dat4c.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat4c.fixef[1,3], 3)` and `r round(dat4c.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) `r round(dat4c.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat4c.fixef[2,3], 3)` and `r round(dat4c.fixef[2,4], 3)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws4c dat4c.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws4c2 #| echo: false dat4c.summ_draws <- dat4c.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat4c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(dat4c.summ_draws[1,2], 3)` and we are 95% confident that the true value is between `r round(dat4c.summ_draws[1,3], 3)` and `r round(dat4c.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average)`r round(dat4c.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat4c.summ_draws[2,3], 3)` and `r round(dat4c.summ_draws[2,4], 3)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" It will also use `mutate()` to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising. ```{r} #| label: as_draws4c dat4c.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> mutate(across(everything(), exp)) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat4c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(exp(dat4c.summ_draws[1,2]), 3)` and we are 95% confident that the true value is between `r round(exp(dat4c.summ_draws[1,3]), 3)` and `r round(exp(dat4c.summ_draws[1,4]), 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) a factor of `r round(exp(dat4c.summ_draws[ 2,2]), 3)` units and we are 95% confident that this change is between `r round(exp(dat4c.summ_draws[2,3]), 3)` and `r round(exp(dat4c.summ_draws[2,4]), 3)`. This represents a ((value -1) * 100) `r round((exp(dat4c.summ_draws[ 2,2]) - 1) * 100, 1)`% increase in $y$ per unit increase in $x$. ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when $x$ is centered, so it is more convenient to refer to this parameter via a regular expression. ```{r} #| label: gather_draws4c #| fig-width: 8 #| fig-height: 3 dat4c.brm2 |> get_variables() dat4c.brm2 |> gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws4c2 #| fig-width: 5 #| fig-height: 3 dat4c.brm2 |> gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R24c dat4c.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R24c2 #| echo: false dat4c.r2 <- dat4c.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat4c.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat4c.r2[1,2], 3)`% and `r round(100*dat4c.r2[1,3], 3)`% ::::: ::::::: ### Example 5 (Binary data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary5a dat5a.brm2 |> summary() ``` ```{r} #| label: summaryF5a #| echo: false dat5a.fixef <- fixef(dat5a.brm2) dat5a.sum <- summary(dat5a.brm2) ``` ::: {.callout-important} The results presented by the `summary()` function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat5a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$, the expected value of $y$ is `r round(dat5a.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat5a.fixef[1,3], 3)` and `r round(dat5a.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat5a.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat5a.fixef[2,3], 3)` and `r round(dat5a.fixef[2,4], 3)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws5a dat5a.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws5a2 #| echo: false dat5a.summ_draws <- dat5a.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` ::: {.callout-important} The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat5a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(dat5a.summ_draws[3,2], 3)` and we are 95% confident that the true value is between `r round(dat5a.summ_draws[1,3], 3)` and `r round(dat5a.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat5a.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat5a.summ_draws[2,3], 3)` and `r round(dat5a.summ_draws[2,4], 3)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" It will also use `mutate()` to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising. ```{r} #| label: as_draws5a dat5a.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> mutate(across(everything(), exp)) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat5a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(exp(dat5a.summ_draws[1,2]), 3)` and we are 95% confident that the true value is between `r round(exp(dat5a.summ_draws[1,3]), 3)` and `r round(exp(dat5a.summ_draws[1,4]), 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of `r round(exp(dat5a.summ_draws[ 2,2]), 3)` and we are 95% confident that this change is between `r round(exp(dat5a.summ_draws[2,3]), 3)` and `r round(exp(dat5a.summ_draws[2,4]), 3)`. This represents a (value -1) * 100 `r round((exp(dat5a.summ_draws[ 2,2]) - 1) * 100, 1)` % increase in $y$ per unit increase in $x$. ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. ```{r} #| label: gather_draws5a #| fig-width: 8 #| fig-height: 3 dat5a.brm2 |> gather_draws(b_Intercept, b_x) |> mutate(across(everything(), exp)) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws5a2 #| fig-width: 5 #| fig-height: 3 dat5a.brm2 |> gather_draws(b_Intercept, b_x) |> mutate(across(everything(), exp)) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R25a dat5a.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R25a2 #| echo: false dat5a.r2 <- dat5a.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat5a.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat5a.r2[1,2], 3)`% and `r round(100*dat5a.r2[1,3], 3)`% ::::: #### Centered predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary5b dat5b.brm2 |> summary() ``` ```{r} #| label: summaryF5b #| echo: false dat5b.fixef <- fixef(dat5b.brm2) dat5b.sum <- summary(dat5b.brm2) ``` ::: {.callout-important} The results presented by the `summary()` function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat5b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$, the expected value of $y$ is `r round(dat5b.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat5b.fixef[1,3], 3)` and `r round(dat5b.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat5b.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat5b.fixef[2,3], 3)` and `r round(dat5b.fixef[2,4], 3)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws5b dat5b.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws5b2 #| echo: false dat5b.summ_draws <- dat5b.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` ::: {.callout-important} The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat5b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(dat5b.summ_draws[3,2], 3)` and we are 95% confident that the true value is between `r round(dat5b.summ_draws[1,3], 3)` and `r round(dat5b.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat5b.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat5b.summ_draws[2,3], 3)` and `r round(dat5b.summ_draws[2,4], 3)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" It will also use `mutate()` to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising. ```{r} #| label: as_draws5b dat5b.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> mutate(across(everything(), exp)) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat5b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(exp(dat5b.summ_draws[1,2]), 3)` and we are 95% confident that the true value is between `r round(exp(dat5b.summ_draws[1,3]), 3)` and `r round(exp(dat5b.summ_draws[1,4]), 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of `r round(exp(dat5b.summ_draws[ 2,2]), 3)` and we are 95% confident that this change is between `r round(exp(dat5b.summ_draws[2,3]), 3)` and `r round(exp(dat5b.summ_draws[2,4]), 3)`. This represents a (value -1) * 100 `r round((exp(dat5b.summ_draws[ 2,2]) - 1) * 100, 1)` % increase in $y$ per unit increase in $x$. ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. ```{r} #| label: gather_draws5b #| fig-width: 8 #| fig-height: 3 dat5b.brm2 |> gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws5b2 #| fig-width: 5 #| fig-height: 3 dat5b.brm2 |> gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R25b dat5b.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R25b2 #| echo: false dat5b.r2 <- dat5b.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat5b.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat5b.r2[1,2], 3)`% and `r round(100*dat5b.r2[1,3], 3)`% ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary5c dat5c.brm2 |> summary() ``` ```{r} #| label: summaryF5c #| echo: false dat5c.fixef <- fixef(dat5c.brm2) dat5c.sum <- summary(dat5c.brm2) ``` ::: {.callout-important} The results presented by the `summary()` function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(dat5c.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat5c.fixef[1,3], 3)` and `r round(dat5c.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) `r round(dat5c.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat5c.fixef[2,3], 3)` and `r round(dat5c.fixef[2,4], 3)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws5c dat5c.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws5c2 #| echo: false dat5c.summ_draws <- dat5c.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat5c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(dat5c.summ_draws[1,2], 3)` and we are 95% confident that the true value is between `r round(dat5c.summ_draws[1,3], 3)` and `r round(dat5c.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average)`r round(dat5c.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat5c.summ_draws[2,3], 3)` and `r round(dat5c.summ_draws[2,4], 3)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" It will also use `mutate()` to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising. ```{r} #| label: as_draws5c dat5c.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> mutate(across(everything(), exp)) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat5c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(exp(dat5c.summ_draws[1,2]), 3)` and we are 95% confident that the true value is between `r round(exp(dat5c.summ_draws[1,3]), 3)` and `r round(exp(dat5c.summ_draws[1,4]), 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) a factor of `r round(exp(dat5c.summ_draws[ 2,2]), 3)` units and we are 95% confident that this change is between `r round(exp(dat5c.summ_draws[2,3]), 3)` and `r round(exp(dat5c.summ_draws[2,4]), 3)`. This represents a ((value -1) * 100) `r round((exp(dat5c.summ_draws[ 2,2]) - 1) * 100, 1)`% increase in $y$ per unit increase in $x$. ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when $x$ is centered, so it is more convenient to refer to this parameter via a regular expression. ```{r} #| label: gather_draws5c #| fig-width: 8 #| fig-height: 3 dat5c.brm2 |> get_variables() dat5c.brm2 |> gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws5c2 #| fig-width: 5 #| fig-height: 3 dat5c.brm2 |> gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R25c dat5c.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R25c2 #| echo: false dat5c.r2 <- dat5c.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat5c.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat5c.r2[1,2], 3)`% and `r round(100*dat5c.r2[1,3], 3)`% ::::: ::::::: ### Example 6 (Binomial data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary6a dat6a.brm2 |> summary() ``` ```{r} #| label: summaryF6a #| echo: false dat6a.fixef <- fixef(dat6a.brm2) dat6a.sum <- summary(dat6a.brm2) ``` ::: {.callout-important} The results presented by the `summary()` function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat6a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$, the expected value of $y$ is `r round(dat6a.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat6a.fixef[1,3], 3)` and `r round(dat6a.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat6a.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat6a.fixef[2,3], 3)` and `r round(dat6a.fixef[2,4], 3)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws6a dat6a.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws6a2 #| echo: false dat6a.summ_draws <- dat6a.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` ::: {.callout-important} The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat6a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(dat6a.summ_draws[3,2], 3)` and we are 95% confident that the true value is between `r round(dat6a.summ_draws[1,3], 3)` and `r round(dat6a.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat6a.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat6a.summ_draws[2,3], 3)` and `r round(dat6a.summ_draws[2,4], 3)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" It will also use `mutate()` to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising. ```{r} #| label: as_draws6a dat6a.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> mutate(across(everything(), exp)) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat6a.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(exp(dat6a.summ_draws[1,2]), 3)` and we are 95% confident that the true value is between `r round(exp(dat6a.summ_draws[1,3]), 3)` and `r round(exp(dat6a.summ_draws[1,4]), 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of `r round(exp(dat6a.summ_draws[ 2,2]), 3)` and we are 95% confident that this change is between `r round(exp(dat6a.summ_draws[2,3]), 3)` and `r round(exp(dat6a.summ_draws[2,4]), 3)`. This represents a (value -1) * 100 `r round((exp(dat6a.summ_draws[ 2,2]) - 1) * 100, 1)` % increase in $y$ per unit increase in $x$. ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. ```{r} #| label: gather_draws6a #| fig-width: 8 #| fig-height: 3 dat6a.brm2 |> gather_draws(b_Intercept, b_x) |> mutate(across(everything(), exp)) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws6a2 #| fig-width: 5 #| fig-height: 3 dat6a.brm2 |> gather_draws(b_Intercept, b_x) |> mutate(across(everything(), exp)) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R26a dat6a.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R26a2 #| echo: false dat6a.r2 <- dat6a.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat6a.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat6a.r2[1,2], 3)`% and `r round(100*dat6a.r2[1,3], 3)`% ::::: #### Centered predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary6b dat6b.brm2 |> summary() ``` ```{r} #| label: summaryF6b #| echo: false dat6b.fixef <- fixef(dat6b.brm2) dat6b.sum <- summary(dat6b.brm2) ``` ::: {.callout-important} The results presented by the `summary()` function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat6b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$, the expected value of $y$ is `r round(dat6b.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat6b.fixef[1,3], 3)` and `r round(dat6b.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat6b.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat6b.fixef[2,3], 3)` and `r round(dat6b.fixef[2,4], 3)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws6b dat6b.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws6b2 #| echo: false dat6b.summ_draws <- dat6b.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` ::: {.callout-important} The results presented by the above function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat6b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(dat6b.summ_draws[3,2], 3)` and we are 95% confident that the true value is between `r round(dat6b.summ_draws[1,3], 3)` and `r round(dat6b.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) `r round(dat6b.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat6b.summ_draws[2,3], 3)` and `r round(dat6b.summ_draws[2,4], 3)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" It will also use `mutate()` to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising. ```{r} #| label: as_draws6b dat6b.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> mutate(across(everything(), exp)) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat6b.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$, the expected value of $y$ is `r round(exp(dat6b.summ_draws[1,2]), 3)` and we are 95% confident that the true value is between `r round(exp(dat6b.summ_draws[1,3]), 3)` and `r round(exp(dat6b.summ_draws[1,4]), 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. So for every one unit change in $x$, $y$ increases by (on average) a factor of `r round(exp(dat6b.summ_draws[ 2,2]), 3)` and we are 95% confident that this change is between `r round(exp(dat6b.summ_draws[2,3]), 3)` and `r round(exp(dat6b.summ_draws[2,4]), 3)`. This represents a (value -1) * 100 `r round((exp(dat6b.summ_draws[ 2,2]) - 1) * 100, 1)` % increase in $y$ per unit increase in $x$. ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. ```{r} #| label: gather_draws6b #| fig-width: 8 #| fig-height: 3 dat6b.brm2 |> gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws6b2 #| fig-width: 5 #| fig-height: 3 dat6b.brm2 |> gather_draws(`b_Intercept`, `b_.*x.*`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R26b dat6b.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R26b2 #| echo: false dat6b.r2 <- dat6b.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat6b.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat6b.r2[1,2], 3)`% and `r round(100*dat6b.r2[1,3], 3)`% ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Summary ```{r} #| label: summary6c dat6c.brm2 |> summary() ``` ```{r} #| label: summaryF6c #| echo: false dat6c.fixef <- fixef(dat6c.brm2) dat6c.sum <- summary(dat6c.brm2) ``` ::: {.callout-important} The results presented by the `summary()` function on the link scale (in this case, the logit, or log-odds scale). As such, they can be awkward to interpret and are particularly punishing for your audience. ::: **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat1c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(dat6c.fixef[1,1], 3)` and we are 95% confident that the true value is between `r round(dat6c.fixef[1,3], 3)` and `r round(dat6c.fixef[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) `r round(dat6c.fixef[ 2,1], 3)` units and we are 95% confident that this change is between `r round(dat6c.fixef[2,3], 3)` and `r round(dat6c.fixef[2,4], 3)` **Note, the estimates are means and quantiles.** ##### Summarise draws As an alternative to the regular summary (which is designed to resemble that traditionally provided by frequentist analyses - and thus be instantly familiar), it is possible to define exactly how each of the parameters are summarised. In the following, I am nominating that I want to summarise each parameter posterior by: - the median - the 95% highest probability density interval (credibility interval) - Rhat - total number of draws - bulk and tail effective sample sizes ```{r} #| label: summarise_draws6c dat6c.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` ```{r} #| label: summarise_draws6c2 #| echo: false dat6c.summ_draws <- dat6c.brm2 |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) |> as.data.frame() ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat6c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(dat6c.summ_draws[1,2], 3)` and we are 95% confident that the true value is between `r round(dat6c.summ_draws[1,3], 3)` and `r round(dat6c.summ_draws[1,4], 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average)`r round(dat6c.summ_draws[ 2,2], 3)` units and we are 95% confident that this change is between `r round(dat6c.summ_draws[2,3], 3)` and `r round(dat6c.summ_draws[2,4], 3)` ##### As draws As yet a more flexible alternative, if we first extract the full posterior draws (with `as_draws_df()`), we can then use the various `tidyverse` functions to focus on just the most ecologically interpretable parameters before summarising. In the following, I will use `select()` with a regex (regular expression) to match only the columns that: - start with (`^`) "b_" followed by any amount of (`*`) any character (`.`) - start with (`^`) "sigma" It will also use `mutate()` to transform the results back onto the scale of the response (by expoenentiating the the posteriors before summarising. ```{r} #| label: as_draws6c dat6c.brm2 |> brms::as_draws_df() |> dplyr::select(matches("^b_.*|^sigma")) |> mutate(across(everything(), exp)) |> posterior::summarise_draws( median, HDInterval::hdi, rhat, length, ess_bulk, ess_tail ) ``` **Conclusions**: - in the initial block of information, we are reminded on the formula as well as chain fitting characteristics. We are also informed that the total number of post-warmup MCMC samples is `r nrow(as.matrix(dat6c.brm2))`. - the `Rhat` values for each parameter are >1.01 confirming convergence of the parameter estimates across the chains - `Bulk_ESS`: effective sample sizes for the bulk of the posterior range are a high fraction (< 0.5) of the total number of draws. This suggests that the sampling was reasonably efficient and therefore likely to be reliable - `Tail_ESS`: effective sample sizes of estimates in the tail of the posterior. The tails (more extreme regions of the posterior) are where the sampler is most likely to get stuck (have divergent transitions etc). The fraction of effective samples from this region is relatively high implying that the sampler did not get stuck in such areas that are unsupported by data. - `b_Intercept`: when $x=0$ (its average since it is standardised), the expected value of $y$ is `r round(exp(dat6c.summ_draws[1,2]), 3)` and we are 95% confident that the true value is between `r round(exp(dat6c.summ_draws[1,3]), 3)` and `r round(exp(dat6c.summ_draws[1,4]), 3)`. Since $x=0$ is outside the observed range of $x$, this y-intercept is of very limited value. - `b_x`: (the slope) - the rate of change in $y$ per unit (=1) change in $x$. Recall that since $x$ is standardised, 1 unit representes a span of 1 standard deviation of $x$. So for every one standard deviation unit change in $x$, $y$ increases by (on average) a factor of `r round(exp(dat6c.summ_draws[ 2,2]), 3)` units and we are 95% confident that this change is between `r round(exp(dat6c.summ_draws[2,3]), 3)` and `r round(exp(dat6c.summ_draws[2,4]), 3)`. This represents a ((value -1) * 100) `r round((exp(dat6c.summ_draws[ 2,2]) - 1) * 100, 1)`% increase in $y$ per unit increase in $x$. ##### Density plots (via `gather_draws`) The `gather_draws()` function performs the equivalent of an `as_draws_df()` followed by a `pivot_longer()` in order to return the full posteriors in long format where they are more suitable for graphing. Note, that the name of the slope parameter gets very awkward when $x$ is centered, so it is more convenient to refer to this parameter via a regular expression. ```{r} #| label: gather_draws6c #| fig-width: 8 #| fig-height: 3 dat6c.brm2 |> get_variables() dat6c.brm2 |> gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + geom_histogram(aes(x = .value)) + facet_wrap(~.variable, scales = "free") ``` Alternatively, there are various other representations supported by the `ggdist` package. ```{r} #| label: gather_draws6c2 #| fig-width: 5 #| fig-height: 3 dat6c.brm2 |> gather_draws(`b_Intercept`,`b_.*x`, regex = TRUE) |> mutate(across(everything(), exp)) |> ggplot() + stat_halfeye(aes(x = .value, y = .variable)) ``` ##### R² ```{r} #| label: R26c dat6c.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` ```{r} #| label: R26c2 #| echo: false dat6c.r2 <- dat6c.brm2 |> bayes_R2(summary = FALSE) |> median_hdci() ``` **Conclusions**: - `r round(100*dat6c.r2[1,1], 3)`% of the total variability in $y$ can be explained by its relationship to $x$ - we are 95% confident that the strength of this relationship is between `r round(100*dat6c.r2[1,2], 3)`% and `r round(100*dat6c.r2[1,3], 3)`% ::::: ::::::: :::: # Predictions Whilst linear models are useful for estimating effects (relative differences), because they are low dimensional (only focus on a small number of covariates) they are not good at absolute predictions. Nevertheless, predicting values from linear models provides the basis for investigating/estimating additional effects and generating various graphics to visualise the estimates. There are a large number of candidate routines for performing prediction. We will go through some of these. It is worth noting that in this context prediction is technically the act of estimating what we expect to get if we were to collect a **single** new observation from a particular population (e.g. a specific level of fertilizer concentration). Often this is not what we want. Often we want the **fitted** values - estimates of what we expect to get if we were to collect **multiple** new observations and average them. So while fitted values represent the expected underlying processes occurring in the system, predicted values represent our expectations from sampling from such processes. | Package | Function | Description | Summarise with | | --- | --- | ------ | --- | | `emmeans` | `emmeans` | Estimated marginal means from which posteriors can be drawn (via `tidy_draws` or `gather_emmeans_draws()`) | `median_hdci()` | | `rstantools` | `posterior_predict` | Draw from the posterior of a prediction (includes sigma) - _predicts single observations_ | `summarise_draws()` | | `rstantools` | `posterior_linpred` | Draw from the posterior of the fitted values (**on the link scale**) - _predicts average observations_ | `summarise_draws()` | | `rstantools` | `posterior_epred` | Draw from the posterior of the fitted values (**on the response scale**) - _predicts average observations_ | `summarise_draws()` | | `tidybayes` | `predicted_draws` | Extract the posterior of _prediction_ values | `median_hdci()` | | `tidybayes` | `epred_draws` | Extract the posterior of _expected_ values | `median_hdci()` | | ~~`tidybayes`~~ | ~~`fitted_draws`~~ | ~~Extract the posterior of _fitted_ values~~ | `median_hdci()` | | `tidybayes` | `add_predicted_draws` | Adds draws from the posterior of _predictions_ to a data frame (of prediction data) | `median_hdci()` | | `tidybayes` | `add_fitted_draws` | Adds draws from the posterior of _fitted_ values to a data frame (of prediction data) | `median_hdci()` | : {.primary .bordered .sm .paramsTable tbl-colwidths="[20, 20, 50, 10]"} For simple models prediction is essentially taking the model formula complete with parameter (coefficient) estimates and solving for new values of the predictor. To explore this, we will use the fitted model to predict Yield for a Fertilizer concentration of 110. We will therefore start by establishing this prediction domain as a data frame to use across all of the prediction routines. :::: {.panel-tabset} ### Example 1 (Gaussian data) ::::::: {.panel-tabset} #### Raw predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans1a dat1a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) # OR with more control over the way posteriors are summarised dat1a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans1a_2 dat1a.emmeans <- dat1a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat1a.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat1a.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict1a dat1a.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat1a.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict1a_2 dat1a.pred <- dat1a.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat1a.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat1a.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred1a dat1a.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat1a.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred1a_2 dat1a.epred <- dat1a.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat1a.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat1a.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred1a dat1a.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat1a.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred1a_2 dat1a.linpred <- dat1a.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat1a.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat1a.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: #### Centered predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans1b dat1b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) # OR with more control over the way posteriors are summarised dat1b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans1b_2 dat1b.emmeans <- dat1b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat1b.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat1b.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict1b dat1b.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat1b.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict1b_2 dat1b.pred <- dat1b.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat1b.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat1b.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred1b dat1b.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat1b.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred1b_2 dat1b.epred <- dat1b.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat1b.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat1b.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred1b dat1b.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> median_hdci() # Or for even more control and ability to add other summaries dat1b.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred1b_2 dat1b.linpred <- dat1b.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> median_hdci() ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat1b.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat1b.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: #### Standardised predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans1c dat1c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) # OR with more control over the way posteriors are summarised dat1c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans1c_2 dat1c.emmeans <- dat1c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat1c.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat1c.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict1c dat1c.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat1c.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict1c_2 dat1c.pred <- dat1c.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat1c.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat1c.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred1c dat1c.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat1c.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred1c_2 dat1c.epred <- dat1c.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat1c.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat1c.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred1c dat1c.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> median_hdci() # Or for even more control and ability to add other summaries dat1c.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred1c_2 dat1c.linpred <- dat1c.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> median_hdci() ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat1c.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat1c.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: ::::::: ### Example 2 (categorical predictor) ::::::: {.panel-tabset} #### Treatment contrasts In each case, we will predict $y$ when $x$ is "control", "medium" and "high" Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans2a dat2a.brm2 |> emmeans(~x) # OR with more control over the way posteriors are summarised dat2a.brm2 |> emmeans(~x) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans2a_2 dat2a.emmeans <- dat2a.brm2 |> emmeans(~x) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of "control" is `r round(dat2a.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "medium" is `r round(dat2a.emmeans[2,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "high" is `r round(dat2a.emmeans[3,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict2a dat2a.brm2 |> posterior_predict(newdata = data.frame(x = c("control", "medium", "high"))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat2a.brm2 |> add_predicted_draws(newdata = data.frame(x = c("control", "medium", "high"))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict2a_2 dat2a.pred <- dat2a.brm2 |> posterior_predict(newdata = data.frame(x = c("control", "medium", "high"))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of "control" is `r round(dat2a.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "medium" is `r round(dat2a.pred[2,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "high" is `r round(dat2a.pred[3,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred2a dat2a.brm2 |> posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat2a.brm2 |> add_epred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred2a_2 dat2a.epred <- dat2a.brm2 |> posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of "control" is `r round(dat2a.epred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "medium" is `r round(dat2a.epred[2,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "high" is `r round(dat2a.epred[3,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred2a dat2a.brm2 |> posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |> median_hdci() # Or for even more control and ability to add other summaries dat2a.brm2 |> add_linpred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred2a_2 dat2a.linpred <- dat2a.brm2 |> posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |> median_hdci() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of "control" is `r round(dat2a.linpred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "medium" is `r round(dat2a.linpred[2,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "high" is `r round(dat2a.linpred[3,2][[1]], 3)` - 95% HPD intervals also given ::::: #### Means parameterisation In each case, we will predict $y$ when $x$ is "control", "medium" and "high" Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans2b dat2b.brm2 |> emmeans(~x) # OR with more control over the way posteriors are summarised dat2b.brm2 |> emmeans(~x) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans2b_2 dat2b.emmeans <- dat2b.brm2 |> emmeans(~x) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of "control" is `r round(dat2b.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "medium" is `r round(dat2b.emmeans[2,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "high" is `r round(dat2b.emmeans[3,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict2b dat2b.brm2 |> posterior_predict(newdata = data.frame(x = c("control", "medium", "high"))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat2b.brm2 |> add_predicted_draws(newdata = data.frame(x = c("control", "medium", "high"))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict2b_2 dat2b.pred <- dat2b.brm2 |> posterior_predict(newdata = data.frame(x = c("control", "medium", "high"))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of "control" is `r round(dat2b.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "medium" is `r round(dat2b.pred[2,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "high" is `r round(dat2b.pred[3,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred2b dat2b.brm2 |> posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat2b.brm2 |> add_epred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred2b_2 dat2b.epred <- dat2b.brm2 |> posterior_epred(newdata = data.frame(x = c("control", "medium", "high"))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of "control" is `r round(dat2b.epred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "medium" is `r round(dat2b.epred[2,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "high" is `r round(dat2b.epred[3,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred2b dat2b.brm2 |> posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |> median_hdci() # Or for even more control and ability to add other summaries dat2b.brm2 |> add_linpred_draws(newdata = data.frame(x = c("control", "medium", "high"))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred2b_2 dat2b.linpred <- dat2b.brm2 |> posterior_linpred(newdata = data.frame(x = c("control", "medium", "high"))) |> median_hdci() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of "control" is `r round(dat2b.linpred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "medium" is `r round(dat2b.linpred[2,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of "high" is `r round(dat2b.linpred[3,2][[1]], 3)` - 95% HPD intervals also given ::::: ::::::: ### Example 3 (Poisson data) ::::::: {.panel-tabset} #### Raw predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans3a dat3a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response") # OR with more control over the way posteriors are summarised dat3a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> mutate(.value = exp(.value)) |> median_hdci() # OR with yet more control over the way posteriors are summarised dat3a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> dplyr::select(-.chain) |> summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: emmeans3a_2 #| echo: false dat3a.emmeans <- dat3a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response") |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat3a.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat3a.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict3a dat3a.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat3a.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict3a_2 dat3a.pred <- dat3a.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat3a.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat3a.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred3a dat3a.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat3a.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred3a_2 #| echo: false dat3a.epred <- dat3a.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat3a.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat3a.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred3a dat3a.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> exp() |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat3a.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |> mutate(.linpred = exp(.linpred)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred3a_2 #| echo: false dat3a.linpred <- dat3a.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> exp() |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat3a.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat3a.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: #### Centered predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans3b dat3b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) # OR with more control over the way posteriors are summarised dat3b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans3b_2 dat3b.emmeans <- dat3b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat3b.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat3b.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict3b dat3b.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat3b.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict3b_2 dat3b.pred <- dat3b.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat3b.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat3b.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred3b dat3b.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat3b.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred3b_2 dat3b.epred <- dat3b.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat3b.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat3b.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred3b dat3b.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> exp() |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat3b.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |> mutate(.linpred = exp(.linpred)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred3b_2 dat3b.linpred <- dat3b.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> exp() |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat3b.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat3b.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: #### Standardised predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans3c dat3c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) # OR with more control over the way posteriors are summarised dat3c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans3c_2 dat3c.emmeans <- dat3c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat3c.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat3c.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict3c dat3c.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat3c.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict3c_2 dat3c.pred <- dat3c.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat3c.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat3c.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred3c dat3c.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat3c.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred3c_2 dat3c.epred <- dat3c.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat3c.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat3c.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred3c dat3c.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> exp() |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat3c.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |> mutate(.linpred = exp(.linpred)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred3c_2 dat3c.linpred <- dat3c.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> exp() |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat3c.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat3c.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: ::::::: ### Example 4 (NB data) ::::::: {.panel-tabset} #### Raw predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans4a dat4a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response") # OR with more control over the way posteriors are summarised dat4a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> mutate(.value = exp(.value)) |> median_hdci() # OR with yet more control over the way posteriors are summarised dat4a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> dplyr::select(-.chain) |> summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: emmeans4a_2 #| echo: false dat4a.emmeans <- dat4a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response") |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat4a.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat4a.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict4a dat4a.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat4a.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict4a_2 dat4a.pred <- dat4a.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat4a.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat4a.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred4a dat4a.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat4a.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred4a_2 #| echo: false dat4a.epred <- dat4a.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat4a.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat4a.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred4a dat4a.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> exp() |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat4a.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |> mutate(.linpred = exp(.linpred)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred4a_2 #| echo: false dat4a.linpred <- dat4a.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> exp() |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat4a.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat4a.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: #### Centered predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans4b dat4b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) # OR with more control over the way posteriors are summarised dat4b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans4b_2 dat4b.emmeans <- dat4b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat4b.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat4b.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict4b dat4b.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat4b.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict4b_2 dat4b.pred <- dat4b.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat4b.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat4b.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred4b dat4b.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat4b.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred4b_2 dat4b.epred <- dat4b.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat4b.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat4b.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred4b dat4b.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> exp() |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat4b.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |> mutate(.linpred = exp(.linpred)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred4b_2 dat4b.linpred <- dat4b.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> exp() |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat4b.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat4b.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: #### Standardised predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans4c dat4c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) # OR with more control over the way posteriors are summarised dat4c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans4c_2 dat4c.emmeans <- dat4c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat4c.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat4c.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict4c dat4c.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat4c.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict4c_2 dat4c.pred <- dat4c.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat4c.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat4c.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred4c dat4c.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat4c.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred4c_2 dat4c.epred <- dat4c.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat4c.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat4c.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred4c dat4c.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> exp() |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat4c.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |> mutate(.linpred = exp(.linpred)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred4c_2 dat4c.linpred <- dat4c.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> exp() |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat4c.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat4c.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: ::::::: ### Example 5 (Binary data) ::::::: {.panel-tabset} #### Raw predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans5a dat5a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response") # OR with more control over the way posteriors are summarised dat5a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> mutate(.value = plogis(.value)) |> median_hdci() # OR with yet more control over the way posteriors are summarised dat5a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> dplyr::select(-.chain) |> summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: emmeans5a_2 #| echo: false dat5a.emmeans <- dat5a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response") |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat5a.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat5a.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict5a dat5a.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat5a.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict5a_2 dat5a.pred <- dat5a.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat5a.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat5a.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred5a dat5a.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat5a.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred5a_2 #| echo: false dat5a.epred <- dat5a.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat5a.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat5a.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred5a dat5a.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> plogis() |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat5a.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |> mutate(.linpred = plogis(.linpred)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred5a_2 #| echo: false dat5a.linpred <- dat5a.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> plogis() |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat5a.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat5a.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: #### Centered predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans5b dat5b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) # OR with more control over the way posteriors are summarised dat5b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans5b_2 dat5b.emmeans <- dat5b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat5b.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat5b.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict5b dat5b.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat5b.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict5b_2 dat5b.pred <- dat5b.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat5b.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat5b.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred5b dat5b.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat5b.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred5b_2 dat5b.epred <- dat5b.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat5b.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat5b.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred5b dat5b.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> plogis() |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat5b.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |> mutate(.linpred = plogis(.linpred)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred5b_2 dat5b.linpred <- dat5b.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> plogis() |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat5b.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat5b.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: #### Standardised predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans5c dat5c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) # OR with more control over the way posteriors are summarised dat5c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans5c_2 dat5c.emmeans <- dat5c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat5c.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat5c.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict5c dat5c.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat5c.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict5c_2 dat5c.pred <- dat5c.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat5c.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat5c.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred5c dat5c.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat5c.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5))) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred5c_2 dat5c.epred <- dat5c.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5))) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat5c.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat5c.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred5c dat5c.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> plogis() |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat5c.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5))) |> mutate(.linpred = plogis(.linpred)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred5c_2 dat5c.linpred <- dat5c.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5))) |> plogis() |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat5c.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat5c.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: ::::::: ### Example 6 (Binomial data) ::::::: {.panel-tabset} #### Raw predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans6a dat6a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response") # OR with more control over the way posteriors are summarised dat6a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> mutate(.value = plogis(.value)) |> median_hdci() # OR with yet more control over the way posteriors are summarised dat6a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> dplyr::select(-.chain) |> summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: emmeans6a_2 #| echo: false dat6a.emmeans <- dat6a.brm2 |> emmeans(~x, at = list(x = c(2.5, 5)), type = "response") |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat6a.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat6a.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict6a dat6a.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat6a.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict6a_2 dat6a.pred <- dat6a.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat6a.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat6a.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred6a dat6a.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat6a.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred6a_2 #| echo: false dat6a.epred <- dat6a.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat6a.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat6a.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred6a dat6a.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |> plogis() |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat6a.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |> mutate(.linpred = plogis(.linpred)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred6a_2 #| echo: false dat6a.linpred <- dat6a.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |> plogis() |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat6a.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat6a.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: #### Centered predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans6b dat6b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) # OR with more control over the way posteriors are summarised dat6b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans6b_2 dat6b.emmeans <- dat6b.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat6b.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat6b.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict6b dat6b.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat6b.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict6b_2 dat6b.pred <- dat6b.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat6b.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat6b.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred6b dat6b.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat6b.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred6b_2 dat6b.epred <- dat6b.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat6b.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat6b.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred6b dat6b.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |> plogis() |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat6b.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |> mutate(.linpred = plogis(.linpred)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred6b_2 dat6b.linpred <- dat6b.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |> plogis() |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat6b.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat6b.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: #### Standardised predictor In each case, we will predict $y$ when $x$ is 2.5 and 5 Note, for a Gaussian model `emmeans`, `posterior_epred` and `posterior_linpred` will all yield the same outputs. `posterior_predict` will yield wider credible intervals as it is predicting for individual observations rather than predicting means of large samples. ::::: {.panel-tabset} ##### Estimated marginal means ```{r} #| label: emmeans6c dat6c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) # OR with more control over the way posteriors are summarised dat6c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> median_hdci() ``` ```{r} #| label: emmeans6c_2 dat6c.emmeans <- dat6c.brm2 |> emmeans(~x, at = list(x = c(2.5, 5))) |> as.data.frame() ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat6c.emmeans[1,2], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat6c.emmeans[2,2], 3)` - 95% HPD intervals also given ##### Posterior predictions ```{r} #| label: predict6c dat6c.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat6c.brm2 |> add_predicted_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: predict6c_2 dat6c.pred <- dat6c.brm2 |> posterior_predict(newdata = data.frame(x = c(2.5, 5), total = 1)) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the predicted (estimated) mean $y$ associated with an $x$ of 2.5 is `r round(dat6c.pred[1,2][[1]], 3)` - the predicted (estimated) mean $y$ associated with an $x$ of 5 is `r round(dat6c.pred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior expected values ```{r} #| label: epred6c dat6c.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat6c.brm2 |> add_epred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: epred6c_2 dat6c.epred <- dat6c.brm2 |> posterior_epred(newdata = data.frame(x = c(2.5, 5), total = 1)) |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat6c.epred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat6c.epred[2,2][[1]], 3)` - 95% HPD intervals also given ##### Posterior linear predictions ```{r} #| label: lpred6c dat6c.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |> plogis() |> summarise_draws(median, HDInterval::hdi) # Or for even more control and ability to add other summaries dat6c.brm2 |> add_linpred_draws(newdata = data.frame(x = c(2.5, 5), total = 1)) |> mutate(.linpred = plogis(.linpred)) |> dplyr::select(-.chain) |> # need to exclude this field as it interfers with as_draws_df() summarise_draws( median, HDInterval::hdi ) ``` ```{r} #| label: linpred6c_2 dat6c.linpred <- dat6c.brm2 |> posterior_linpred(newdata = data.frame(x = c(2.5, 5), total = 1)) |> plogis() |> summarise_draws(median, HDInterval::hdi) ``` **Conclusions**: - the fitted mean $y$ associated with an $x$ of 2.5 is `r round(dat6c.linpred[1,2][[1]], 3)` - the fitted mean $y$ associated with an $x$ of 5 is `r round(dat6c.linpred[2,2][[1]], 3)` - 95% HPD intervals also given ::::: ::::::: :::: # Further investigations Since we have the entire posterior, we are able to make probability statements. We simply count up the number of MCMC sample draws that satisfy a condition (e.g represent a slope greater than 0) and then divide by the total number of MCMC samples. :::: {.panel-tabset} ### Example 1 (Gaussian data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Specific hypotheses Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses: 1. a change in $x$ is associated with an increase in $y$ 2. a doubling of $x$ (from 2.5 to 5) is associated with an increase in $y$ of > 50% ::: {.panel-tabset .nav-pills} ##### 1. Positive relationship ```{r} #| label: hypotheses2a_1 dat1a.brm2 |> hypothesis("x > 0") ``` ```{r} #| label: hypotheses2a_1_2 dat1a_hyp <- dat1a.brm2 |> hypothesis("x > 0") |> _[["hypothesis"]] ``` **Conclusions**: - the parameter (`b_x`) minus 0 is `r round(dat1a_hyp[1,2], 3)` - `Evid.Ratio`: the ratio of evidence for the hypothesis vs the evidence against it. In this case, the evidence ratio is `r round(dat1a_hyp[1,6], 3)` - `Inf` is because the divisor was 0 (no evidence against the hypothesis). - `Post.Prob`: the probability of the hypothesis is `r round(dat1a_hyp[1,7], 3)` - there is very high evidence for this hypothesis Alternatively, we could use `gather_draws` to achieve a similar outcome. In the following, in addition to median and HPD intervals, we will calculate the probability that the slope (`b_x`) is greater than 0. To calculate such a probability, we could simply count up the number of posterior `b_x` values that are greater than zero and then divide by the total number of posterior `b_x` values. In R, we could do this as `sum(b_x > 0)/length(b_x)` (where `b_x > 0` will return either a 1 for each case it is true and a 0 when it is false, and thus summing is like counting). Dividing a sum by its length equates to a mean and thus we can achieve the probability by calcualing the mean of `b_x > 0`. ```{r} #| label: hypotheses1a_2 dat1a.brm2 |> gather_draws(b_x) |> summarise_draws(median, HDInterval::hdi, P = ~mean(. > 0) ) ``` ```{r} #| label: hypotheses1a_2_2 dat1a_hyp2 <- dat1a.brm2 |> gather_draws(b_x) |> summarise_draws(median, HDInterval::hdi, P = ~mean(. > 0) ) ``` ::: {.callout-tip} The `summarise_draws()` function expects a set of one or more summary or diagnostic functions (such as `median` etc). These can be supplied either as the name of the function (as in the case for `median` in the example above) or if more arguments or information is required by the function, the function can be written out in full. in this case, the function must be proceeded with a `~` and the variable is denoted a `.` (such as in `P = ~mean(. > 0)` above). ::: **Conclusions**: - the parameter (`b_x`) minus 0 is `r round(dat1a_hyp2[1,3], 3)` - `P`: the probability of the hypothesis is `r round(dat1a_hyp[1,6], 3)` - there is very high evidence for this hypothesis ##### 2. >50% increase when $x$ doubles ```{r} #| label: hypotheses1a_3 dat1a.brm2 |> emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> ungroup() |> group_by(.draw) |> summarise(ES = 100 * diff(.value) / .value[1]) |> hypothesis("ES > 50") ``` ```{r} #| label: hypotheses1a_3_2 #| echo: false dat1a_hyp3 <- dat1a.brm2 |> emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> ungroup() |> group_by(.draw) |> summarise(ES = 100 * diff(.value) / .value[1]) |> hypothesis("ES > 50") |> _[["hypothesis"]] ``` **Conclusions**: - the difference between the percentage change in estimated $y$ as $x$ increases from 2.5 to 5 and 50% is `r round(dat1a_hyp3[1,2], 3)` - the evidence ratio in support of the hypothesis that the percentage change exceeds 50% is `r round(dat1a_hyp3[ 1,6], 3)` - the probability that the change in $y$ exceeds 50% is `r round(dat1a_hyp3[ 1,7], 3)` - the evidence for such a change is very weak ```{r} #| label: hypotheses1a_4 dat1a.brm2 |> emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> ungroup() |> group_by(.draw) |> summarise(ES = 100 * diff(.value) / .value[1]) |> summarise_draws( mean, median, HDInterval::hdi, P = ~ mean(. > 50) ) ``` ```{r} #| label: hypotheses1a_4_2 #| echo: false dat1a_hyp4 <- dat1a.brm2 |> emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> ungroup() |> group_by(.draw) |> summarise(ES = 100 * diff(.value) / .value[1]) |> summarise_draws( mean, median, HDInterval::hdi, P = ~ mean(. > 50) ) |> as_data_frame() ``` **Conclusions**: - the percentage change in estimated $y$ as $x$ increases from 2.5 to 5 is `r round(dat1a_hyp4[1,2], 3)` - the probability that the change in $y$ exceeds 50% is `r round(dat1a_hyp4[ 1,6], 3)` - the evidence for such a change is very weak ::: ##### ROPE The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0). The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is "practically equivalent" to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. @Kruschke-2018-270 argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on @Cohen-1988. @Kruschke-2018-270 also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of "null-hypothesis" testing in a Bayesian framework. - if the HDI of the focal parameter falls completely **outside** the ROPE, there is strong evidence that there **is** an effect - if the HDI of the focal parameter falls completely **inside** the ROPE, there is strong evidence that there **is not** an effect - otherwise there is not clear evidence either way ::: {.callout-note} ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a "non-significant" result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart. I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights. ::: ```{r} #| label: rope1a_1 #| fig-width: 5 #| fig-height: 3 ## Calculate ROPE range manually ROPE <- c(-0.1, 0.1) * with(dat, sd(y)) ## OR calculate ROPE range via rope_range function ROPE <- bayestestR::rope_range(dat1a.brm2) dat1a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") dat1a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") |> plot() ``` ```{r} #| label: rope1a_1_2 #| echo: false dat1a.rope <- dat1a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") ``` **Conclusions**: - the percentage of the HPD for the slope that is inside the ROPE is `r round(dat1a.rope$ROPE_Percentage, 3)` - there is strong evidence for an effect OR using the `rope` function. ```{r} #| label: rope1a_2 #| fig-width: 5 #| fig-height: 3 ## Calculate ROPE range manually dat1a.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") dat1a.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") |> plot() ``` The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect) ::::: #### Centered predictor ::::: {.panel-tabset} ##### Specific hypotheses Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses: 1. a change in $x$ is associated with an increase in $y$ 2. a doubling of $x$ (from 2.5 to 5) is associated with an increase in $y$ of > 50% ::: {.panel-tabset .nav-pills} ###### 1. Positive relationship ```{r} #| label: hypotheses1b_1 dat1b.brm2 |> get_variables() dat1b.brm2 |> hypothesis("scalexscaleEQFALSE > 0") ``` ```{r} #| label: hypotheses1b_1_2 dat1b_hyp <- dat1b.brm2 |> hypothesis("scalexscaleEQFALSE > 0") |> _[["hypothesis"]] ``` **Conclusions**: - the parameter (`b_scalexscaledEQFALSE`) minus 0 is `r round(dat1b_hyp[1,2], 3)` - `Evid.Ratio`: the ratio of evidence for the hypothesis vs the evidence against it. In this case, the evidence ratio is `r round(dat1b_hyp[1,6], 3)` - `Inf` is because the divisor was 0 (no evidence against the hypothesis). - `Post.Prob`: the probability of the hypothesis is `r round(dat1b_hyp[1,7], 3)` - there is very high evidence for this hypothesis Alternatively, we could use `gather_draws` to achieve a similar outcome. In the following, in addition to median and HPD intervals, we will calculate the probability that the slope (`b_x`) is greater than 0. To calculate such a probability, we could simply count up the number of posterior `b_x` values that are greater than zero and then divide by the total number of posterior `b_x` values. In R, we could do this as `sum(b_x > 0)/length(b_x)` (where `b_x > 0` will return either a 1 for each case it is true and a 0 when it is false, and thus summing is like counting). Dividing a sum by its length equates to a mean and thus we can achieve the probability by calcualing the mean of `b_x > 0`. ```{r} #| label: hypotheses1b_2 dat1b.brm2 |> gather_draws(`b_.*x.*`, regex = TRUE) |> summarise_draws(median, HDInterval::hdi, P = ~mean(. > 0) ) ``` ```{r} #| label: hypotheses1b_2_2 dat1b_hyp2 <- dat1b.brm2 |> gather_draws(`b_.*x.*`, regex = TRUE) |> summarise_draws(median, HDInterval::hdi, P = ~mean(. > 0) ) ``` ::: {.callout-tip} The `summarise_draws()` function expects a set of one or more summary or diagnostic functions (such as `median` etc). These can be supplied either as the name of the function (as in the case for `median` in the example above) or if more arguments or information is required by the function, the function can be written out in full. in this case, the function must be proceeded with a `~` and the variable is denoted a `.` (such as in `P = ~mean(. > 0)` above). ::: **Conclusions**: - the parameter (`b_scalexscaleEQFALSE`) minus 0 is `r round(dat1b_hyp2[1,3], 3)` - `P`: the probability of the hypothesis is `r round(dat1b_hyp[1,6], 3)` - there is very high evidence for this hypothesis ###### 2. >50% increase when $x$ doubles ```{r} #| label: hypotheses1b_3 dat1b.brm2 |> emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> ungroup() |> group_by(.draw) |> summarise(ES = 100 * diff(.value) / .value[1]) |> hypothesis("ES > 50") ``` ```{r} #| label: hypotheses1b_3_2 #| echo: false dat1b_hyp3 <- dat1b.brm2 |> emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> ungroup() |> group_by(.draw) |> summarise(ES = 100 * diff(.value) / .value[1]) |> hypothesis("ES > 50") |> _[["hypothesis"]] ``` **Conclusions**: - the difference between the percentage change in estimated $y$ as $x$ increases from 2.5 to 5 and 50% is `r round(dat1b_hyp3[1,2], 3)` - the evidence ratio in support of the hypothesis that the percentage change exceeds 50% is `r round(dat1b_hyp3[ 1,6], 3)` - the probability that the change in $y$ exceeds 50% is `r round(dat1b_hyp3[ 1,7], 3)` - the evidence for such a change is very weak ```{r} #| label: hypotheses1b_4 dat1b.brm2 |> emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> ungroup() |> group_by(.draw) |> summarise(ES = 100 * diff(.value) / .value[1]) |> summarise_draws( mean, median, HDInterval::hdi, P = ~ mean(. > 50) ) ``` ```{r} #| label: hypotheses1b_4_2 #| echo: false dat1b_hyp4 <- dat1b.brm2 |> emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> ungroup() |> group_by(.draw) |> summarise(ES = 100 * diff(.value) / .value[1]) |> summarise_draws( mean, median, HDInterval::hdi, P = ~ mean(. > 50) ) |> as_data_frame() ``` **Conclusions**: - the percentage change in estimated $y$ as $x$ increases from 2.5 to 5 is `r round(dat1b_hyp4[1,2], 3)` - the probability that the change in $y$ exceeds 50% is `r round(dat1b_hyp4[ 1,6], 3)` - the evidence for such a change is very weak ::: ##### ROPE The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0). The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is "practically equivalent" to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. @Kruschke-2018-270 argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on @Cohen-1988. @Kruschke-2018-270 also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of "null-hypothesis" testing in a Bayesian framework. - if the HDI of the focal parameter falls completely **outside** the ROPE, there is strong evidence that there **is** an effect - if the HDI of the focal parameter falls completely **inside** the ROPE, there is strong evidence that there **is not** an effect - otherwise there is not clear evidence either way ::: {.callout-note} ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a "non-significant" result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart. I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights. ::: ```{r} #| label: rope1b_1 #| fig-width: 5 #| fig-height: 3 ## Calculate ROPE range manually ROPE <- c(-0.1, 0.1) * with(dat, sd(y)) ## OR calculate ROPE range via rope_range function ROPE <- bayestestR::rope_range(dat1b.brm2) dat1b.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") dat1b.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") |> plot() ``` ```{r} #| label: rope1b_1_2 #| echo: false dat1b.rope <- dat1b.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") ``` **Conclusions**: - the percentage of the HPD for the slope that is inside the ROPE is `r round(dat1b.rope$ROPE_Percentage, 3)` - there is strong evidence for an effect OR using the `rope` function. ```{r} #| label: rope1b_2 #| fig-width: 5 #| fig-height: 3 ## Calculate ROPE range manually dat1b.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") dat1b.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") |> plot() ``` The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect) ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Specific hypotheses Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses: 1. a change in $x$ is associated with an increase in $y$ 2. a doubling of $x$ (from 2.5 to 5) is associated with an increase in $y$ of > 50% ::: {.panel-tabset .nav-pills} ###### 1. Positive relationship ```{r} #| label: hypotheses1c_1 dat1c.brm2 |> get_variables() dat1c.brm2 |> hypothesis("scalex > 0") ``` ```{r} #| label: hypotheses1c_1_2 dat1c_hyp <- dat1c.brm2 |> hypothesis("scalex > 0") |> _[["hypothesis"]] ``` **Conclusions**: - the parameter (`b_scalex`) minus 0 is `r round(dat1c_hyp[1,2], 3)` - `Evid.Ratio`: the ratio of evidence for the hypothesis vs the evidence against it. In this case, the evidence ratio is `r round(dat1c_hyp[1,6], 3)` - `Inf` is because the divisor was 0 (no evidence against the hypothesis). - `Post.Prob`: the probability of the hypothesis is `r round(dat1c_hyp[1,7], 3)` - there is very high evidence for this hypothesis Alternatively, we could use `gather_draws` to achieve a similar outcome. In the following, in addition to median and HPD intervals, we will calculate the probability that the slope (`b_x`) is greater than 0. To calculate such a probability, we could simply count up the number of posterior `b_x` values that are greater than zero and then divide by the total number of posterior `b_x` values. In R, we could do this as `sum(b_x > 0)/length(b_x)` (where `b_x > 0` will return either a 1 for each case it is true and a 0 when it is false, and thus summing is like counting). Dividing a sum by its length equates to a mean and thus we can achieve the probability by calcualing the mean of `b_x > 0`. ```{r} #| label: hypotheses1c_2 dat1c.brm2 |> gather_draws(`b_.*x`, regex = TRUE) |> summarise_draws(median, HDInterval::hdi, P = ~mean(. > 0) ) ``` ```{r} #| label: hypotheses1c_2_2 dat1c_hyp2 <- dat1c.brm2 |> gather_draws(`b_.*x`, regex = TRUE) |> summarise_draws(median, HDInterval::hdi, P = ~mean(. > 0) ) ``` ::: {.callout-tip} The `summarise_draws()` function expects a set of one or more summary or diagnostic functions (such as `median` etc). These can be supplied either as the name of the function (as in the case for `median` in the example above) or if more arguments or information is required by the function, the function can be written out in full. in this case, the function must be proceeded with a `~` and the variable is denoted a `.` (such as in `P = ~mean(. > 0)` above). ::: **Conclusions**: - the parameter (`b_scalex`) minus 0 is `r round(dat1c_hyp2[1,3], 3)` - `P`: the probability of the hypothesis is `r round(dat1c_hyp[1,6], 3)` - there is very high evidence for this hypothesis ###### 2. >50% increase when $x$ doubles ```{r} #| label: hypotheses1c_3 dat1c.brm2 |> emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> ungroup() |> group_by(.draw) |> summarise(ES = 100 * diff(.value) / .value[1]) |> hypothesis("ES > 50") ``` ```{r} #| label: hypotheses1c_3_2 #| echo: false dat1c_hyp3 <- dat1c.brm2 |> emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> ungroup() |> group_by(.draw) |> summarise(ES = 100 * diff(.value) / .value[1]) |> hypothesis("ES > 50") |> _[["hypothesis"]] ``` **Conclusions**: - the difference between the percentage change in estimated $y$ as $x$ increases from 2.5 to 5 and 50% is `r round(dat1c_hyp3[1,2], 3)` - the evidence ratio in support of the hypothesis that the percentage change exceeds 50% is `r round(dat1c_hyp3[ 1,6], 3)` - the probability that the change in $y$ exceeds 50% is `r round(dat1c_hyp3[ 1,7], 3)` - the evidence for such a change is very weak ```{r} #| label: hypotheses1c_4 dat1c.brm2 |> emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> ungroup() |> group_by(.draw) |> summarise(ES = 100 * diff(.value) / .value[1]) |> summarise_draws( mean, median, HDInterval::hdi, P = ~ mean(. > 50) ) ``` ```{r} #| label: hypotheses1c_4_2 #| echo: false dat1c_hyp4 <- dat1c.brm2 |> emmeans::emmeans(~x, at = list(x = c(2.5, 5))) |> gather_emmeans_draws() |> ungroup() |> group_by(.draw) |> summarise(ES = 100 * diff(.value) / .value[1]) |> summarise_draws( mean, median, HDInterval::hdi, P = ~ mean(. > 50) ) |> as_data_frame() ``` **Conclusions**: - the percentage change in estimated $y$ as $x$ increases from 2.5 to 5 is `r round(dat1c_hyp4[1,2], 3)` - the probability that the change in $y$ exceeds 50% is `r round(dat1c_hyp4[ 1,6], 3)` - the evidence for such a change is very weak ::: ##### ROPE The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0). The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is "practically equivalent" to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. @Kruschke-2018-270 argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on @Cohen-1988. @Kruschke-2018-270 also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of "null-hypothesis" testing in a Bayesian framework. - if the HDI of the focal parameter falls completely **outside** the ROPE, there is strong evidence that there **is** an effect - if the HDI of the focal parameter falls completely **inside** the ROPE, there is strong evidence that there **is not** an effect - otherwise there is not clear evidence either way ::: {.callout-note} ROPE and equivalence tests are of most use when you decide that there is not enough evidence to support an hypothesis that there is an effect. Such a "non-significant" result may be because there genuinely is not effect OR you do not have enough power to detect the effect. Performing an equivalence test provides a mechanism to tease these two appart. I provide the following example purely to illustrate how such a test would be performed. In this case, as we have already demonstrated strong evidence for an effect, the equivalence test does not yield any additional insights. ::: ```{r} #| label: rope1c_1 #| fig-width: 5 #| fig-height: 3 ## Calculate ROPE range manually ROPE <- c(-0.1, 0.1) * with(dat, sd(y)) ## OR calculate ROPE range via rope_range function ROPE <- bayestestR::rope_range(dat1c.brm2) dat1c.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") dat1c.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") |> plot() ``` ```{r} #| label: rope1c_1_2 #| echo: false dat1c.rope <- dat1c.brm2 |> bayestestR::equivalence_test(parameters = "b_.*x.*", range = ROPE, ci_method = "HDI") ``` **Conclusions**: - the percentage of the HPD for the slope that is inside the ROPE is `r round(dat1c.rope$ROPE_Percentage, 3)` - there is strong evidence for an effect OR using the `rope` function. ```{r} #| label: rope1c_2 #| fig-width: 5 #| fig-height: 3 ## Calculate ROPE range manually dat1c.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") dat1c.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") |> plot() ``` The above demonstration, was applied to the simple comparison that the slope was not equal to 0, however, it can similarly be applied to any hypothesis (although typically only if there is no evidence of an effect) ::::: ::::::: ### Example 2 (categorical predictor) ::::::: {.panel-tabset} #### Treatment contrasts ::::: {.panel-tabset} ##### Specific hypotheses Now that we have the full posteriors, we are free to use these to garner evidence on a range of hypotheses. To demonstrate, we will consider the following hypotheses: 1. all pairwise comparisons (compare each level of $x$ to each other 2. define a specific set of contrasts that include comparing the average of medium and high treatments to the control treatment. ::: {.panel-tabset .nav-pills} ###### 1. pairwise contrasts ```{r} #| label: hypothesis2a_1 dat2a.brm2 |> emmeans(~x) |> pairs() ``` Or if we want the full posteriors... This option allows us to calculate exceedence probabilities. That is, we can calculate the proportion of contrast posteriors that exceed a specific value (as a hypothesis). In this case, we will calculate two exceedence probabilities: 1. probability that the effect is negative (e.g. proportion of probabilities that are less than 0) 2. probability that the effect is positive (e.g. proportion of probabilities that are greater than 0) ```{r} #| label: hypothesis2a_2 dat2a.brm2 |> emmeans(~x) |> pairs() |> gather_emmeans_draws() |> dplyr::select(-.chain) |> summarise_draws(median, HDInterval::hdi, Pl = ~ mean(.x < 0), Pg = ~ mean(.x > 0) ) ``` ```{r} #| label: hypothesis2a_2_2 #| echo: false dat2a.emmeans <- dat2a.brm2 |> emmeans(~x) |> pairs() |> gather_emmeans_draws() |> dplyr::select(-.chain) |> summarise_draws(median, HDInterval::hdi, Pl = ~ mean(.x < 0), Pg = ~ mean(.x > 0) ) |> as.data.frame() ``` **Conclusions**: - the difference in $y$ between "control" and "medium" is `r round(dat2a.emmeans[2,3], 2)`, however there is no evidence of this effect (exceedence probability) - the difference in $y$ between "control" and "high" is `r round(dat2a.emmeans[1,3], 2)`, and there is very strong evidence for this effect - the difference in $y$ between "medium" and "high" is `r round(dat2a.emmeans[3,3], 2)`, and there is very strong evidence for this effect It is also possible to express the magnitude of effect in percentage change. The trick is to put the emmeans parameters onto a logarithmic scale so that the pairwise comparisons (which are a subtraction) effectively are treated as divisions (due to log laws). ```{r} #| label: hypothesis2a_3 dat2a.brm2 |> emmeans(~x) |> regrid(transform = "log") |> pairs() |> regrid() ``` ::: {.collout-info} The estimates are expressed as fractional changes. A "ratio" of 1 indicates parity, since if you multiply something by 1, it does not change. A value of 1.5, indicates a 50% increase and a value of 0.5 indicates a 50% decline. To calculate percentage change from a fractional value, subtract 1 and multiply the result by 100. E.g. ```{r} (1.19 - 1) * 100 ``` ::: If we get the full posteriors, we can also explore whether the change exceeds some ecologically important change (such as 20%) ```{r} #| label: hypothesis2a_4 dat2a.brm2 |> emmeans(~x) |> regrid(transform = "log") |> pairs() |> regrid() |> gather_emmeans_draws() |> dplyr::select(-.chain) |> summarise_draws(median, HDInterval::hdi, Pl = ~ mean(.x < 1), Pg = ~ mean(.x > 1), Pl50 = ~ mean(.x < 0.8), Pg50 = ~ mean(.x > 1.2) ) ``` ```{r} #| label: hypothesis2a_4_2 #| echo: false dat2a.emmeans <- dat2a.brm2 |> emmeans(~x) |> regrid(transform = "log") |> pairs() |> regrid() |> gather_emmeans_draws() |> dplyr::select(-.chain) |> summarise_draws(median, HDInterval::hdi, Pl = ~ mean(.x < 1), Pg = ~ mean(.x > 1), Pl50 = ~ mean(.x < 0.8), Pg50 = ~ mean(.x > 1.2) ) |> as.data.frame() ``` **Conclusions**: - the response $y$ is `r round((dat2a.emmeans[1,3] - 1)*100, 3)`% higher in the control group over the high group. - there is strong evidence (P = `r round(dat2a.emmeans[1,7])`) of the above - there is also strong evidence (P = `r round(dat2a.emmeans[1,9])`) that $y$ is at least 20% higher in the control group over the high group - the response $y$ is `r round((dat2a.emmeans[2,3] - 1)*100, 3)`% higher in the control group over the medium group. - there is no evidence (P = `r round(dat2a.emmeans[2,7],3)`) of the above - there is no evidence (P = `r round(dat2a.emmeans[2,9],3)`) that $y$ is at least 20% higher in the control group over the medium group - the response $y$ is `r round((dat2a.emmeans[3,3] - 1)*100, 3)`% higher in the medium group over the high group. - there is strong evidence (P = `r round(dat2a.emmeans[3,7],3)`) of the above - there is no evidence (P = `r round(dat2a.emmeans[3,9],3)`) that $y$ is at least 20% higher in the medium group over the high group ###### 2. planned contrasts ```{r} #| label: hypothesis_2a_5 cmat <- cbind( "Contrast vs Medium/High" = c(1, -0.5, -0.5), "Medium vs High" = c(0, 1, -1) ) dat2a.brm2 |> emmeans(~x) |> contrast(method = list(x = cmat)) ``` Or with full posteriors and exceedance probabilities... ```{r} #| label: hypothesis2a_6 dat2a.brm2 |> emmeans(~x) |> contrast(method = list(x = cmat)) |> gather_emmeans_draws() |> dplyr::select(-.chain) |> summarise_draws(median, HDInterval::hdi, Pl = ~ mean(.x < 0), Pg = ~ mean(.x > 0) ) ``` ```{r} #| label: hypothesis2a_6_2 #| cache: false #| echo: false dat2a.emmeans <- dat2a.brm2 |> emmeans(~x) |> contrast(method = list(x = cmat)) |> gather_emmeans_draws() |> dplyr::select(-.chain) |> summarise_draws(median, HDInterval::hdi, Pl = ~ mean(.x < 0), Pg = ~ mean(.x > 0) ) |> as.data.frame() ``` **Conclusions**: - on average, $y$ is `r round(dat2a.emmeans[1,3], 3)` higher in the "control" group than the "mediun" and "high" groups. - the evidence for this effect is very strong `r round(dat2a.emmeans[1,7], 3)` ::: ##### ROPE We have already seen that there is no evidence of a difference in $y$ between the "control" and "medium" groups. This could be because either there is not enough power to detect the difference or that the populations are not different. It would be nice to be able to gain some insights into which of these is most likely. And we can. If we establish the range of values that represent an insubstantial effect, we can then quantify the proportion of the posterior that falls inside this Region of Practical Equivalence (ROPE). Conventionally, the ROPE represents within 10% - that is, if the effect is less than 10% change, then we might consider it insubstantial. ```{r} #| label: rope2a_1 #| fig-width: 5 #| fig-height: 3 ## Calculate ROPE range manually ROPE <- c(-0.1, 0.1) * with(dat2, sd(y)) ## OR calculate ROPE range via rope_range function ROPE <- bayestestR::rope_range(dat2a.brm2) dat2a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") dat2a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") |> plot() ``` **Conclusions**: - there is insufficient evidence to conclude that there is a difference in $y$ between "control" and "medium" groups - we cannot conclude that there is evidence of no effect ```{r} #| label: rope2a_1_2 #| echo: false dat2a.rope <- dat2a.brm2 |> bayestestR::equivalence_test(parameters = "b_x", range = ROPE, ci_method = "HDI") ``` ```{r} #| label: rope2a_2 #| fig-width: 5 #| fig-height: 3 ## Calculate ROPE range manually dat2a.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") dat2a.brm2 |> bayestestR::rope(parameters = "x", range = ROPE, ci_method = "HDI") |> plot() ``` ::::: ::::::: :::: # Summary plots :::: {.panel-tabset} ### Example 1 (Gaussian data) ::::::: {.panel-tabset} #### Raw predictor ::::: {.panel-tabset} ##### Partial plot ```{r} #| label: summary_plot1a #| fig-width: 5 #| fig-height: 3 dat1a.grid <- list(x = modelr::seq_range(dat$x, n = 100)) dat1a.brm2 |> emmeans(~x, at = dat1a.grid) |> as.data.frame() |> ggplot(aes(y = emmean, x = x)) + geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) + geom_point(data = dat, aes(y = y)) + geom_line() + theme_classic() ``` As a spaghetti plot ```{r} #| label: summary_plot1a_2 #| fig-width: 5 #| fig-height: 3 #| cache: true dat1a.grid <- list(x = modelr::seq_range(dat$x, n = 100)) dat1a.brm2 |> emmeans(~x, at = dat1a.grid) |> gather_emmeans_draws() |> ggplot(aes(y = .value, x = x)) + geom_line(aes(group=.draw), colour = 'orange', alpha=0.01) + geom_point(data = dat, aes(y = y)) + theme_classic() ``` ::::: #### Centered predictor ::::: {.panel-tabset} ##### Partial plot ```{r} #| label: summary_plot1b #| fig-width: 5 #| fig-height: 3 dat1b.grid <- list(x = modelr::seq_range(dat$x, n = 100)) dat1b.brm2 |> emmeans(~x, at = dat1b.grid) |> as.data.frame() |> ggplot(aes(y = emmean, x = x)) + geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) + geom_point(data = dat, aes(y = y)) + geom_line() + theme_classic() ``` As a spaghetti plot ```{r} #| label: summary_plot1b_2 #| fig-width: 5 #| fig-height: 3 #| cache: true dat1b.grid <- list(x = modelr::seq_range(dat$x, n = 100)) dat1b.brm2 |> emmeans(~x, at = dat1b.grid) |> gather_emmeans_draws() |> ggplot(aes(y = .value, x = x)) + geom_line(aes(group=.draw), colour = 'orange', alpha=0.01) + geom_point(data = dat, aes(y = y)) + theme_classic() ``` ::::: #### Standardised predictor ::::: {.panel-tabset} ##### Partial plot ```{r} #| label: summary_plot1c #| fig-width: 5 #| fig-height: 3 dat1c.grid <- list(x = modelr::seq_range(dat$x, n = 100)) dat1c.brm2 |> emmeans(~x, at = dat1c.grid) |> as.data.frame() |> ggplot(aes(y = emmean, x = x)) + geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "orange", alpha = 0.3) + geom_point(data = dat, aes(y = y)) + geom_line() + theme_classic() ``` As a spaghetti plot ```{r} #| label: summary_plot1c_2 #| fig-width: 5 #| fig-height: 3 #| cache: true dat1c.grid <- list(x = modelr::seq_range(dat$x, n = 100)) dat1c.brm2 |> emmeans(~x, at = dat1c.grid) |> gather_emmeans_draws() |> ggplot(aes(y = .value, x = x)) + geom_line(aes(group=.draw), colour = 'orange', alpha=0.01) + geom_point(data = dat, aes(y = y)) + theme_classic() ``` ::::: ::::::: ### Example 2 (categorical predictor) ::::::: {.panel-tabset} #### Treatment contrasts ::::: {.panel-tabset} ##### Partial plot ```{r} #| label: summary_plot2a #| fig-width: 5 #| fig-height: 3 dat2a.brm2 |> emmeans(~x) |> as.data.frame() |> ggplot(aes(y = emmean, x = x)) + geom_pointrange(aes(ymin = lower.HPD, ymax = upper.HPD)) + theme_classic() ``` ::::: ::::::: :::: # References