1 Preparations

Load the necessary libraries

library(tidyverse) # for data wrangling
library(car) # for regression diagnostics
library(broom) # for tidy output
library(ggfortify) # for model diagnostics
library(knitr) # for kable
library(emmeans) # for estimating marginal means
library(MASS) # for glm.nb
library(brms)
library(broom.mixed)
library(tidybayes)
library(bayesplot)
library(standist) # for visualizing distributions
library(rstanarm)
library(cmdstanr) # for cmdstan
library(ggeffects)
library(rstan)
library(DHARMa)
library(ggridges)
library(easystats) # framework for stats, modelling and visualisation
library(patchwork)
library(modelsummary)
source("helperFunctions.R")

2 Scenario

In an attempt to understand the effects on marine animals of short-term exposure to toxic substances, such as might occur following a spill, or a major increase in storm water flows, a it was decided to examine the toxicant in question, Copper, as part of a field experiment in Hong Kong. The experiment consisted of small sources of Cu (small, hemispherical plaster blocks, impregnated with copper), which released the metal into sea water over 4 or 5 days. The organism whose response to Cu was being measured was a small, polychaete worm, Hydroides, that attaches to hard surfaces in the sea, and is one of the first species to colonize any surface that is submerged. The biological questions focused on whether the timing of exposure to Cu affects the overall abundance of these worms. The time period of interest was the first or second week after a surface being available.

The experimental setup consisted of sheets of black perspex (settlement plates), which provided good surfaces for these worms. Each plate had a plaster block bolted to its centre, and the dissolving block would create a gradient of [Cu] across the plate. Over the two weeks of the experiment, a given plate would have plain plaster blocks (Control) or a block containing copper in the first week, followed by a plain block, or a plain block in the first week, followed by a dose of copper in the second week. After two weeks in the water, plates were removed and counted back in the laboratory. Without a clear idea of how sensitive these worms are to copper, an effect of the treatments might show up as an overall difference in the density of worms across a plate, or it could show up as a gradient in abundance across the plate, with a different gradient in different treatments. Therefore, on each plate, the density of worms (#/cm²) was recorded at each of four distances from the center of the plate.

**Figure** 2: Sampling design for the norin data set

Table 1: Format of copper.csv data file

COPPER	PLATE	DIST	WORMS
control	200	4	11.50
control	200	3	13.00
..	..	..	..

Table 2: Description of the variables in the copper data file

COPPER	Categorical listing of the copper treatment (control = no copper applied, week 2 = copper treatment applied in second week and week 1= copper treatment applied in first week) applied to whole plates. Factor A (between plot factor).
PLATE	Substrate provided for polychaete worm colonization on which copper treatment applied. These are the plots (Factor B). Numbers in this column represent numerical labels given to each plate.
DIST	Categorical listing for the four concentric distances from the center of the plate (source of copper treatment) with 1 being the closest and 4 the furthest. Factor C (within plot factor)
WORMS	Density (#/cm₂) of worms measured. Response variable.

3 Read in the data

copper <- read_csv("../data/copper.csv", trim_ws = TRUE)

Rows: 60 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): COPPER
dbl (3): PLATE, DIST, WORMS

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(copper)

Rows: 60
Columns: 4
$ COPPER <chr> "control", "control", "control", "control", "control", "control…
$ PLATE  <dbl> 200, 200, 200, 200, 39, 39, 39, 39, 34, 34, 34, 34, 36, 36, 36,…
$ DIST   <dbl> 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, …
$ WORMS  <dbl> 11.50, 13.00, 13.50, 12.00, 17.75, 13.75, 12.75, 12.50, 11.50, …

## Explore the first 6 rows of the data
head(copper)

# A tibble: 6 × 4
  COPPER  PLATE  DIST WORMS
  <chr>   <dbl> <dbl> <dbl>
1 control   200     4  11.5
2 control   200     3  13  
3 control   200     2  13.5
4 control   200     1  12  
5 control    39     4  17.8
6 control    39     3  13.8

str(copper)

spc_tbl_ [60 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ COPPER: chr [1:60] "control" "control" "control" "control" ...
 $ PLATE : num [1:60] 200 200 200 200 39 39 39 39 34 34 ...
 $ DIST  : num [1:60] 4 3 2 1 4 3 2 1 4 3 ...
 $ WORMS : num [1:60] 11.5 13 13.5 12 17.8 ...
 - attr(*, "spec")=
  .. cols(
  ..   COPPER = col_character(),
  ..   PLATE = col_double(),
  ..   DIST = col_double(),
  ..   WORMS = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

copper |> datawizard::data_codebook()

copper (60 rows and 4 variables, 4 shown)

ID | Name   | Type      | Missings |     Values |          N
---+--------+-----------+----------+------------+-----------
1  | COPPER | character | 0 (0.0%) |    control | 20 (33.3%)
   |        |           |          |     Week 1 | 20 (33.3%)
   |        |           |          |     Week 2 | 20 (33.3%)
---+--------+-----------+----------+------------+-----------
2  | PLATE  | numeric   | 0 (0.0%) |  [12, 204] |         60
---+--------+-----------+----------+------------+-----------
3  | DIST   | numeric   | 0 (0.0%) |          1 | 15 (25.0%)
   |        |           |          |          2 | 15 (25.0%)
   |        |           |          |          3 | 15 (25.0%)
   |        |           |          |          4 | 15 (25.0%)
---+--------+-----------+----------+------------+-----------
4  | WORMS  | numeric   | 0 (0.0%) | [0, 17.75] |         60
------------------------------------------------------------

copper |> modelsummary::datasummary_skim()

	Unique	Missing Pct.	Mean	SD	Min	Median	Max
PLATE	15	0	98.7	77.0	12.0	61.0	204.0
DIST	4	0	2.5	1.1	1.0	2.5	4.0
WORMS	36	0	8.0	4.3	0.0	8.8	17.8
COPPER	N	%
control	20	33.3
Week 1	20	33.3
Week 2	20	33.3

copper |> modelsummary::datasummary_skim(by = c("COPPER", "DIST"))

	COPPER	DIST	Unique	Missing Pct.	Mean	SD	Min	Median	Max
PLATE	control	4.0	5	0	97.6	84.2	34.0	39.0	200.0
	control	3.0	5	0	97.6	84.2	34.0	39.0	200.0
	control	2.0	5	0	97.6	84.2	34.0	39.0	200.0
	control	1.0	5	0	97.6	84.2	34.0	39.0	200.0
	Week 2	4.0	5	0	104.2	83.0	16.0	63.0	204.0
	Week 2	3.0	5	0	104.2	83.0	16.0	63.0	204.0
	Week 2	2.0	5	0	104.2	83.0	16.0	63.0	204.0
	Week 2	1.0	5	0	104.2	83.0	16.0	63.0	204.0
	Week 1	4.0	5	0	94.2	88.3	12.0	61.0	199.0
	Week 1	3.0	5	0	94.2	88.3	12.0	61.0	199.0
	Week 1	2.0	5	0	94.2	88.3	12.0	61.0	199.0
	Week 1	1.0	5	0	94.2	88.3	12.0	61.0	199.0
WORMS	control	4.0	4	0	13.6	2.8	11.5	12.0	17.8
	control	3.0	5	0	12.4	1.2	10.5	12.8	13.8
	control	2.0	5	0	12.0	1.2	10.5	12.0	13.5
	control	1.0	5	0	10.8	1.4	9.2	10.5	12.5
	Week 2	4.0	4	0	7.8	1.0	6.5	8.0	8.8
	Week 2	3.0	4	0	4.0	1.2	3.0	3.8	6.0
	Week 2	2.0	5	0	1.4	1.0	0.5	1.0	3.0
	Week 2	1.0	3	0	0.2	0.4	0.0	0.0	1.0
	Week 1	4.0	5	0	10.0	1.6	8.2	9.5	12.5
	Week 1	3.0	5	0	8.5	1.5	7.0	8.2	10.8
	Week 1	2.0	5	0	8.3	1.8	6.5	7.8	10.5
	Week 1	1.0	5	0	7.2	1.2	6.2	6.8	9.2
COPPER	N	%
control	20	33.3
Week 1	20	33.3
Week 2	20	33.3

4 Data preparation

Let start by declaring the categorical variables and random effect as factors.

copper <- copper |> mutate(
  COPPER = factor(COPPER),
  PLATE = factor(PLATE),
  DIST = factor(DIST),
  AREA = 4,
  COUNT = WORMS * AREA
)

5 Exploratory data analysis

Model formula: \[ \begin{align} y_i &\sim{} \mathcal{Pois}(\lambda_i)\\ ln(\lambda_i) &=\boldsymbol{\beta} \bf{X_i} + \boldsymbol{\gamma} \bf{Z_i} \end{align} \]

where $\boldsymbol{\beta}$ and $\boldsymbol{\gamma}$ are vectors of the fixed and random effects parameters respectively and $\bf{X}$ is the model matrix representing the overall intercept and effects of copper, distance and their interaction on the number of number of worms. Area of the place segment was also incorporated as an offset. $\bf{Z}$ represents a cell means model matrix for the random intercepts associated with individual plates.

These data are density of worms. This is not ideal. It would be better to have the actual counts along with the area and then model against a Poisson or Negative Binomial along with having an offset for area. Such an approach would allow us to effectively model density whilst also being able to fit a model with a distribution that closely matches the data generation process.

Unfortunately, we only have the densities. As such, our choice of model families is somewhat restricted. Out choices are:

Gaussian: assuming normality etc
log-normal:
Gamma with a log link: so long as we can address the presence of zeros in the data
Tweedie

ggplot(copper, aes(y = WORMS, x = DIST, fill = COPPER)) +
  geom_boxplot()

ggplot(copper, aes(y = WORMS, x = DIST, fill = COPPER)) +
  geom_boxplot() +
  scale_y_continuous(trans = scales::pseudo_log_trans())

ggplot(copper, aes(y = WORMS, x = DIST, colour = COPPER)) +
  geom_point(aes(x = as.numeric(DIST))) +
  geom_line(aes(x = as.numeric(DIST), group = PLATE)) +
  scale_y_continuous(trans = scales::pseudo_log_trans())

In the event that we attempt to model these data against a Gamma family, we are going to need a way to handle the zero values. A Gamma distribution has no mass at zero. One solution is to replace the zero values with small values, where small value is defined as half the value of the smallest positive value.

copper |>
  filter(WORMS > 0) |>
  summarise(min(WORMS) / 2)

# A tibble: 1 × 1
  `min(WORMS)/2`
           <dbl>
1          0.125

6 Fit the model

brms

In brms, the default priors are designed to be weakly informative. They are chosen to provide moderate regularisation (to help prevent over fitting) and help stabilise the computations.

Unlike rstanarm, brms models must be compiled before they start sampling. For most models, the compilation of the stan code takes around 45 seconds.

copper.form <- bf(WORMS ~ COPPER * DIST + (1|PLATE),
                family=Gamma(link='log'))
options(width=150)
copper.form |> get_prior(data = copper)

                  prior     class              coef group resp dpar nlpar lb ub       source
                 (flat)         b                                                    default
                 (flat)         b       COPPERWeek1                             (vectorized)
                 (flat)         b COPPERWeek1:DIST2                             (vectorized)
                 (flat)         b COPPERWeek1:DIST3                             (vectorized)
                 (flat)         b COPPERWeek1:DIST4                             (vectorized)
                 (flat)         b       COPPERWeek2                             (vectorized)
                 (flat)         b COPPERWeek2:DIST2                             (vectorized)
                 (flat)         b COPPERWeek2:DIST3                             (vectorized)
                 (flat)         b COPPERWeek2:DIST4                             (vectorized)
                 (flat)         b             DIST2                             (vectorized)
                 (flat)         b             DIST3                             (vectorized)
                 (flat)         b             DIST4                             (vectorized)
 student_t(3, 2.2, 2.5) Intercept                                                    default
   student_t(3, 0, 2.5)        sd                                          0         default
   student_t(3, 0, 2.5)        sd                   PLATE                  0    (vectorized)
   student_t(3, 0, 2.5)        sd         Intercept PLATE                  0    (vectorized)
      gamma(0.01, 0.01)     shape                                          0         default

options(width=80)

The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]

When defining our own priors, we typically do not want them to be scaled.

If we wanted to define our own priors that were less vague, yet still not likely to bias the outcomes, we could try the following priors (mainly plucked out of thin air):

$\beta_0$: normal centred at 3 with a standard deviation of 0.45
- mean of 3: since median(log(copper$WORMS+10.125)) or median(asinh(copper$WORMS/(2*1))/log(exp(1)))
- sd of 0.45: since mad(log(copper$WORMS+0.125)) or mad(asinh(copper$WORMS/(2*1))/log(exp(1)))
$\beta_{1-2}$: normal centred at 0 with a standard deviation of 0.9
- sd of 0.9: since mad(log(copper$WORMS+0.125))/model.matrix(~COPPER*DIST, data = copper) |> apply(2, sd)
$\beta_{3-5}$: normal centred at 0 with a standard deviation of 1
- sd of 1: since mad(log(copper$WORMS+0.125))/model.matrix(~COPPER*DIST, data = copper) |> apply(2, sd)
$\beta_{6-11}$: normal centred at 0 with a standard deviation of 1.5
- sd of 1.5: since mad(log(copper$WORMS+0.125))/model.matrix(~COPPER*DIST, data = copper) |> apply(2, sd)
$\sigma_j$: half-cauchy with parameters 0 and 2.
$\omega$: gamma with parameters 0.01 and 0.01.
$\Sigma$: decov with:
- regularisation: the exponent for a LKJ prior on the correlation matrix. A value of 1 (default) implies a joint uniform prior
- concentration: the concentration parameter for a symmetric Dirichlet distribution. A value of 1 (default) implies a joint uniform distribution
- shape and scale: the shape and scale parameters for a gamma prior on the scale and scale parameters of the decov prior. A value of 1 for both (default) simplifies the gamma prior to a unit-exponential distribution.

copper |>
  group_by(COPPER, DIST) |>
  summarise(
    log(median(WORMS)),
    log(mad(WORMS))
  )

`summarise()` has grouped output by 'COPPER'. You can override using the
`.groups` argument.

# A tibble: 12 × 4
# Groups:   COPPER [3]
   COPPER  DIST  `log(median(WORMS))` `log(mad(WORMS))`
   <fct>   <fct>                <dbl>             <dbl>
 1 control 1                     2.35             0.617
 2 control 2                     2.48             0.106
 3 control 3                     2.55             0.106
 4 control 4                     2.48            -0.299
 5 Week 1  1                     1.91            -0.299
 6 Week 1  2                     2.05             0.617
 7 Week 1  3                     2.11             0.106
 8 Week 1  4                     2.25             0.394
 9 Week 2  1                  -Inf             -Inf    
10 Week 2  2                     0               -0.299
11 Week 2  3                     1.32            -0.992
12 Week 2  4                     2.08             0.106

standist::visualize("normal(3,0.45)", xlim = c(0, 20))

standist::visualize("student_t(3, 0, 2.5)",
  "cauchy(0,2)",
  xlim = c(-10, 25)
)

priors <- prior(normal(2.5, 0.6), class = "Intercept") +
  prior(normal(0, 1.5), class = "b") +
  prior(student_t(3, 0, 1), class = "sd")

copper.form <- bf(COUNT ~ offset(log(AREA)) + COPPER * DIST + (1 | PLATE),
  family = poisson(link = "log")
)
copper.brm2 <- brm(copper.form,
  data = copper,
  prior = priors,
  sample_prior = "only",
  iter = 5000,
  warmup = 2500,
  chains = 3,
  cores = 3,
  thin = 10,
  refresh = 0,
  seed = 123,
  control = list(adapt_delta = 0.99)
)

Compiling Stan program...

Start sampling

copper.brm2 |>
  conditional_effects("COPPER:DIST") |>
  plot(points = TRUE)

copper.brm2 |>
  ggpredict(~ COPPER * DIST) |>
  plot(show_data = TRUE)

The above seem sufficiently wide whilst at the same time not providing any encouragement for the sampler to wander off into very unsupported areas.

copper.brm3 <- update(copper.brm2,
  sample_prior = "yes",
  control = list(adapt_delta = 0.99),
  refresh = 0
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

copper.brm3 |>
  conditional_effects("COPPER:DIST") |>
  plot(points = TRUE)

copper.brm3 |>
  ggpredict(~ COPPER * DIST) |>
  plot(show_data = TRUE)

priors <- prior(normal(3, 0.45), class = "Intercept") +
  prior(normal(0, 0.9), class = "b", coef = "COPPERWeek1") +
  prior(normal(0, 0.9), class = "b", coef = "COPPERWeek2") +
  prior(normal(0, 1), class = "b", coef = "DIST2") +
  prior(normal(0, 1), class = "b", coef = "DIST3") +
  prior(normal(0, 1), class = "b", coef = "DIST4") +
  prior(normal(0, 1.5), class = "b") +
  prior(cauchy(0, 2), class = "sd") +
  prior(gamma(2, 1), class = "sigma")

copper.form <- bf(I(WORMS + 0.125) ~ COPPER * DIST + (1 | PLATE),
  family = lognormal()
)
copper.brm2a <- brm(copper.form,
  data = copper,
  prior = priors,
  sample_prior = "only",
  iter = 5000,
  warmup = 2500,
  chains = 3,
  cores = 3,
  thin = 10,
  refresh = 0,
  seed = 123,
  control = list(adapt_delta = 0.99)
)

Compiling Stan program...

Start sampling

copper.brm2a |>
  ggpredict(~ COPPER * DIST) |>
  plot(show_data = TRUE)

The above seem sufficiently wide whilst at the same time not providing any encouragement for the sampler to wander off into very unsupported areas.

copper.brm3a <- update(copper.brm2a,
  sample_prior = "yes",
  control = list(adapt_delta = 0.99),
  refresh = 0
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

priors <- prior(normal(3, 0.45), class = "Intercept") +
  prior(normal(0, 0.9), class = "b", coef = "COPPERWeek1") +
  prior(normal(0, 0.9), class = "b", coef = "COPPERWeek2") +
  prior(normal(0, 1), class = "b", coef = "DIST2") +
  prior(normal(0, 1), class = "b", coef = "DIST3") +
  prior(normal(0, 1), class = "b", coef = "DIST4") +
  prior(normal(0, 1.5), class = "b") +
  prior(cauchy(0, 1), class = "sd") +
  prior(gamma(0.01, 0.01), class = "shape")

copper.form <- bf(I(WORMS + 0.125) ~ COPPER * DIST + (DIST | PLATE),
  family = Gamma(link = "log")
)
copper.brm2b <- brm(copper.form,
  data = copper,
  prior = priors,
  sample_prior = "only",
  iter = 5000,
  warmup = 2500,
  chains = 3,
  cores = 3,
  thin = 10,
  refresh = 0,
  seed = 123,
  control = list(adapt_delta = 0.99)
)

Compiling Stan program...

Start sampling

Warning: There were 2 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.

Warning: Examine the pairs() plot to diagnose sampling problems

copper.brm3b <- update(copper.brm2b,
  sample_prior = "yes",
  control = list(adapt_delta = 0.99),
  refresh = 0
)

The desired updates require recompiling the model

Compiling Stan program...

Start sampling

Warning: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
https://mc-stan.org/misc/warnings.html#bfmi-low

Warning: Examine the pairs() plot to diagnose sampling problems

Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess

The above seem sufficiently wide whilst at the same time not providing any encouragement for the sampler to wander off into very unsupported areas.

copper.brm3b |>
  ggpredict(~ COPPER * DIST) |>
  plot(show_data = TRUE)

copper.brm3b |>
  ggpredict(~ DIST * COPPER) |>
  plot(show_data = TRUE)

copper.brm3 |> get_variables()

 [1] "b_Intercept"            "b_COPPERWeek1"          "b_COPPERWeek2"         
 [4] "b_DIST2"                "b_DIST3"                "b_DIST4"               
 [7] "b_COPPERWeek1:DIST2"    "b_COPPERWeek2:DIST2"    "b_COPPERWeek1:DIST3"   
[10] "b_COPPERWeek2:DIST3"    "b_COPPERWeek1:DIST4"    "b_COPPERWeek2:DIST4"   
[13] "sd_PLATE__Intercept"    "Intercept"              "r_PLATE[12,Intercept]" 
[16] "r_PLATE[16,Intercept]"  "r_PLATE[21,Intercept]"  "r_PLATE[34,Intercept]" 
[19] "r_PLATE[36,Intercept]"  "r_PLATE[39,Intercept]"  "r_PLATE[57,Intercept]" 
[22] "r_PLATE[61,Intercept]"  "r_PLATE[63,Intercept]"  "r_PLATE[178,Intercept]"
[25] "r_PLATE[179,Intercept]" "r_PLATE[181,Intercept]" "r_PLATE[199,Intercept]"
[28] "r_PLATE[200,Intercept]" "r_PLATE[204,Intercept]" "prior_Intercept"       
[31] "prior_b"                "prior_sd_PLATE"         "lprior"                
[34] "lp__"                   "accept_stat__"          "stepsize__"            
[37] "treedepth__"            "n_leapfrog__"           "divergent__"           
[40] "energy__"

copper.brm3 |>
  hypothesis("COPPERWeek1=0") |>
  plot()

copper.brm3 |>
  hypothesis("DIST2=0") |>
  plot()

copper.brm3 |> SUYR_prior_and_posterior()

copper.brm3a |> get_variables()

 [1] "b_Intercept"               "b_COPPERWeek1"            
 [3] "b_COPPERWeek2"             "b_DIST2"                  
 [5] "b_DIST3"                   "b_DIST4"                  
 [7] "b_COPPERWeek1:DIST2"       "b_COPPERWeek2:DIST2"      
 [9] "b_COPPERWeek1:DIST3"       "b_COPPERWeek2:DIST3"      
[11] "b_COPPERWeek1:DIST4"       "b_COPPERWeek2:DIST4"      
[13] "sd_PLATE__Intercept"       "sigma"                    
[15] "Intercept"                 "r_PLATE[12,Intercept]"    
[17] "r_PLATE[16,Intercept]"     "r_PLATE[21,Intercept]"    
[19] "r_PLATE[34,Intercept]"     "r_PLATE[36,Intercept]"    
[21] "r_PLATE[39,Intercept]"     "r_PLATE[57,Intercept]"    
[23] "r_PLATE[61,Intercept]"     "r_PLATE[63,Intercept]"    
[25] "r_PLATE[178,Intercept]"    "r_PLATE[179,Intercept]"   
[27] "r_PLATE[181,Intercept]"    "r_PLATE[199,Intercept]"   
[29] "r_PLATE[200,Intercept]"    "r_PLATE[204,Intercept]"   
[31] "prior_Intercept"           "prior_b_COPPERWeek1"      
[33] "prior_b_COPPERWeek2"       "prior_b_DIST2"            
[35] "prior_b_DIST3"             "prior_b_DIST4"            
[37] "prior_b_COPPERWeek1:DIST2" "prior_b_COPPERWeek2:DIST2"
[39] "prior_b_COPPERWeek1:DIST3" "prior_b_COPPERWeek2:DIST3"
[41] "prior_b_COPPERWeek1:DIST4" "prior_b_COPPERWeek2:DIST4"
[43] "prior_sigma"               "prior_sd_PLATE"           
[45] "lprior"                    "lp__"                     
[47] "accept_stat__"             "stepsize__"               
[49] "treedepth__"               "n_leapfrog__"             
[51] "divergent__"               "energy__"

copper.brm3a |>
  hypothesis("COPPERWeek1=0") |>
  plot()

copper.brm3a |>
  hypothesis("DIST2=0") |>
  plot()

copper.brm3a |> SUYR_prior_and_posterior()

copper.brm3b |> get_variables()

  [1] "b_Intercept"                 "b_COPPERWeek1"              
  [3] "b_COPPERWeek2"               "b_DIST2"                    
  [5] "b_DIST3"                     "b_DIST4"                    
  [7] "b_COPPERWeek1:DIST2"         "b_COPPERWeek2:DIST2"        
  [9] "b_COPPERWeek1:DIST3"         "b_COPPERWeek2:DIST3"        
 [11] "b_COPPERWeek1:DIST4"         "b_COPPERWeek2:DIST4"        
 [13] "sd_PLATE__Intercept"         "sd_PLATE__DIST2"            
 [15] "sd_PLATE__DIST3"             "sd_PLATE__DIST4"            
 [17] "cor_PLATE__Intercept__DIST2" "cor_PLATE__Intercept__DIST3"
 [19] "cor_PLATE__DIST2__DIST3"     "cor_PLATE__Intercept__DIST4"
 [21] "cor_PLATE__DIST2__DIST4"     "cor_PLATE__DIST3__DIST4"    
 [23] "shape"                       "Intercept"                  
 [25] "r_PLATE[12,Intercept]"       "r_PLATE[16,Intercept]"      
 [27] "r_PLATE[21,Intercept]"       "r_PLATE[34,Intercept]"      
 [29] "r_PLATE[36,Intercept]"       "r_PLATE[39,Intercept]"      
 [31] "r_PLATE[57,Intercept]"       "r_PLATE[61,Intercept]"      
 [33] "r_PLATE[63,Intercept]"       "r_PLATE[178,Intercept]"     
 [35] "r_PLATE[179,Intercept]"      "r_PLATE[181,Intercept]"     
 [37] "r_PLATE[199,Intercept]"      "r_PLATE[200,Intercept]"     
 [39] "r_PLATE[204,Intercept]"      "r_PLATE[12,DIST2]"          
 [41] "r_PLATE[16,DIST2]"           "r_PLATE[21,DIST2]"          
 [43] "r_PLATE[34,DIST2]"           "r_PLATE[36,DIST2]"          
 [45] "r_PLATE[39,DIST2]"           "r_PLATE[57,DIST2]"          
 [47] "r_PLATE[61,DIST2]"           "r_PLATE[63,DIST2]"          
 [49] "r_PLATE[178,DIST2]"          "r_PLATE[179,DIST2]"         
 [51] "r_PLATE[181,DIST2]"          "r_PLATE[199,DIST2]"         
 [53] "r_PLATE[200,DIST2]"          "r_PLATE[204,DIST2]"         
 [55] "r_PLATE[12,DIST3]"           "r_PLATE[16,DIST3]"          
 [57] "r_PLATE[21,DIST3]"           "r_PLATE[34,DIST3]"          
 [59] "r_PLATE[36,DIST3]"           "r_PLATE[39,DIST3]"          
 [61] "r_PLATE[57,DIST3]"           "r_PLATE[61,DIST3]"          
 [63] "r_PLATE[63,DIST3]"           "r_PLATE[178,DIST3]"         
 [65] "r_PLATE[179,DIST3]"          "r_PLATE[181,DIST3]"         
 [67] "r_PLATE[199,DIST3]"          "r_PLATE[200,DIST3]"         
 [69] "r_PLATE[204,DIST3]"          "r_PLATE[12,DIST4]"          
 [71] "r_PLATE[16,DIST4]"           "r_PLATE[21,DIST4]"          
 [73] "r_PLATE[34,DIST4]"           "r_PLATE[36,DIST4]"          
 [75] "r_PLATE[39,DIST4]"           "r_PLATE[57,DIST4]"          
 [77] "r_PLATE[61,DIST4]"           "r_PLATE[63,DIST4]"          
 [79] "r_PLATE[178,DIST4]"          "r_PLATE[179,DIST4]"         
 [81] "r_PLATE[181,DIST4]"          "r_PLATE[199,DIST4]"         
 [83] "r_PLATE[200,DIST4]"          "r_PLATE[204,DIST4]"         
 [85] "prior_Intercept"             "prior_b_COPPERWeek1"        
 [87] "prior_b_COPPERWeek2"         "prior_b_DIST2"              
 [89] "prior_b_DIST3"               "prior_b_DIST4"              
 [91] "prior_b_COPPERWeek1:DIST2"   "prior_b_COPPERWeek2:DIST2"  
 [93] "prior_b_COPPERWeek1:DIST3"   "prior_b_COPPERWeek2:DIST3"  
 [95] "prior_b_COPPERWeek1:DIST4"   "prior_b_COPPERWeek2:DIST4"  
 [97] "prior_shape"                 "prior_sd_PLATE"             
 [99] "prior_cor_PLATE"             "lprior"                     
[101] "lp__"                        "accept_stat__"              
[103] "stepsize__"                  "treedepth__"                
[105] "n_leapfrog__"                "divergent__"                
[107] "energy__"

copper.brm3b |>
  hypothesis("COPPERWeek1=0") |>
  plot()

copper.brm3b |>
  hypothesis("DIST2=0") |>
  plot()

copper.brm3b |> SUYR_prior_and_posterior()

7 MCMC sampling diagnostics

brms

The bayesplot package offers a range of MCMC diagnostics as well as Posterior Probability Checks (PPC), all of which have a convenient plot() interface. Lets start with the MCMC diagnostics.

available_mcmc()

bayesplot MCMC module:
  mcmc_acf
  mcmc_acf_bar
  mcmc_areas
  mcmc_areas_ridges
  mcmc_combo
  mcmc_dens
  mcmc_dens_chains
  mcmc_dens_overlay
  mcmc_hex
  mcmc_hist
  mcmc_hist_by_chain
  mcmc_intervals
  mcmc_neff
  mcmc_neff_hist
  mcmc_nuts_acceptance
  mcmc_nuts_divergence
  mcmc_nuts_energy
  mcmc_nuts_stepsize
  mcmc_nuts_treedepth
  mcmc_pairs
  mcmc_parcoord
  mcmc_rank_ecdf
  mcmc_rank_hist
  mcmc_rank_overlay
  mcmc_recover_hist
  mcmc_recover_intervals
  mcmc_recover_scatter
  mcmc_rhat
  mcmc_rhat_hist
  mcmc_scatter
  mcmc_trace
  mcmc_trace_highlight
  mcmc_violin

Of these, we will focus on:

trace: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different shade of blue, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

pars <- copper.brm3 |> get_variables()
pars <- pars |>
  str_extract("^b.Intercept|^b_COPPER.*|^b_DIST.*|[sS]igma|^sd.*") |>
  na.omit()
pars

 [1] "b_Intercept"         "b_COPPERWeek1"       "b_COPPERWeek2"      
 [4] "b_DIST2"             "b_DIST3"             "b_DIST4"            
 [7] "b_COPPERWeek1:DIST2" "b_COPPERWeek2:DIST2" "b_COPPERWeek1:DIST3"
[10] "b_COPPERWeek2:DIST3" "b_COPPERWeek1:DIST4" "b_COPPERWeek2:DIST4"
[13] "sd_PLATE__Intercept"
attr(,"na.action")
 [1] 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
[26] 39 40
attr(,"class")
[1] "omit"

copper.brm3 |> mcmc_plot(type = "trace", variable = pars)

No divergences to plot.

## OR
copper.brm3 |> mcmc_plot(
  type = "trace",
  variable = "^b.Intercept|^b_COPPER.*|^b_DIST.*|[sS]igma|^sd.*",
  regex = TRUE
)

No divergences to plot.

The chains appear well mixed and very similar

acf_bar (auto-correlation function): plots the auto-correlation between successive MCMC sample lags for each parameter and each chain

copper.brm3 |> mcmc_plot(type = "acf_bar", variable = pars)

## OR
copper.brm3 |> mcmc_plot(
  type = "acf_bar",
  variable = "^b.Intercept|^b_COPPER.*|^b_DIST.*|[sS]igma|^sd.*",
  regex = TRUE
)

There is no evidence of auto-correlation in the MCMC samples

rhat_hist: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

copper.brm3 |> mcmc_plot(type = "rhat_hist")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

All Rhat values are below 1.05, suggesting the chains have converged.

neff_hist (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

copper.brm3 |> mcmc_plot(type = "neff_hist")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Ratios all very high.

More diagnostics

copper.brm3 |> mcmc_plot(type = "combo", pars = pars)

Warning: Argument 'pars' is deprecated. Please use 'variable' instead.

copper.brm3 |> mcmc_plot(type = "violin", pars = pars)

Warning: Argument 'pars' is deprecated. Please use 'variable' instead.

Of these, we will focus on:

trace: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different shade of blue, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

pars <- copper.brm3a |> get_variables()
pars <- pars |>
  str_extract("^b.Intercept|^b_COPPER.*|^b_DIST.*|[sS]igma|^sd.*") |>
  na.omit()
pars

 [1] "b_Intercept"         "b_COPPERWeek1"       "b_COPPERWeek2"      
 [4] "b_DIST2"             "b_DIST3"             "b_DIST4"            
 [7] "b_COPPERWeek1:DIST2" "b_COPPERWeek2:DIST2" "b_COPPERWeek1:DIST3"
[10] "b_COPPERWeek2:DIST3" "b_COPPERWeek1:DIST4" "b_COPPERWeek2:DIST4"
[13] "sd_PLATE__Intercept" "sigma"               "sigma"              
attr(,"na.action")
 [1] 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
[26] 40 41 42 44 45 46 47 48 49 50 51 52
attr(,"class")
[1] "omit"

copper.brm3a |> mcmc_plot(type = "trace", variable = pars)

No divergences to plot.

## OR
copper.brm3a |> mcmc_plot(
  type = "trace",
  variable = "^b.Intercept|^b_COPPER.*|^b_DIST.*|[sS]igma|^sd.*",
  regex = TRUE
)

No divergences to plot.

The chains appear well mixed and very similar

acf_bar (auto-correlation function): plots the auto-correlation between successive MCMC sample lags for each parameter and each chain

copper.brm3a |> mcmc_plot(type = "acf_bar", variable = pars)

## OR
copper.brm3a |> mcmc_plot(
  type = "acf_bar",
  variable = "^b.Intercept|^b_COPPER.*|^b_DIST.*|[sS]igma|^sd.*",
  regex = TRUE
)

There is no evidence of auto-correlation in the MCMC samples

rhat_hist: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

copper.brm3a |> mcmc_plot(type = "rhat_hist")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

All Rhat values are below 1.05, suggesting the chains have converged.

neff_hist (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

copper.brm3a |> mcmc_plot(type = "neff_hist")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Ratios all very high.

More diagnostics

copper.brm3a |> mcmc_plot(type = "combo", pars = pars)

Warning: Argument 'pars' is deprecated. Please use 'variable' instead.

copper.brm3a |> mcmc_plot(type = "violin", pars = pars)

Warning: Argument 'pars' is deprecated. Please use 'variable' instead.

Of these, we will focus on:

trace: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different shade of blue, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

pars <- copper.brm3b |> get_variables()
pars <- pars |>
  str_extract("^b.Intercept|^b_COPPER.*|^b_DIST.*|[sS]igma|^sd.*") |>
  na.omit()
pars

 [1] "b_Intercept"         "b_COPPERWeek1"       "b_COPPERWeek2"      
 [4] "b_DIST2"             "b_DIST3"             "b_DIST4"            
 [7] "b_COPPERWeek1:DIST2" "b_COPPERWeek2:DIST2" "b_COPPERWeek1:DIST3"
[10] "b_COPPERWeek2:DIST3" "b_COPPERWeek1:DIST4" "b_COPPERWeek2:DIST4"
[13] "sd_PLATE__Intercept" "sd_PLATE__DIST2"     "sd_PLATE__DIST3"    
[16] "sd_PLATE__DIST4"    
attr(,"na.action")
 [1]  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
[20]  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
[39]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73
[58]  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92
[77]  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107
attr(,"class")
[1] "omit"

copper.brm3b |> mcmc_plot(type = "trace", variable = pars)

No divergences to plot.

## OR
copper.brm3b |> mcmc_plot(
  type = "trace",
  variable = "^b.Intercept|^b_COPPER.*|^b_DIST.*|[sS]igma|^sd.*",
  regex = TRUE
)

No divergences to plot.

The chains appear well mixed and very similar

acf_bar (auto-correlation function): plots the auto-correlation between successive MCMC sample lags for each parameter and each chain

copper.brm3b |> mcmc_plot(type = "acf_bar", variable = pars)

## OR
copper.brm3b |> mcmc_plot(
  type = "acf_bar",
  variable = "^b.Intercept|^b_COPPER.*|^b_DIST.*|[sS]igma|^sd.*",
  regex = TRUE
)

There is no evidence of auto-correlation in the MCMC samples

rhat_hist: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

copper.brm3b |> mcmc_plot(type = "rhat_hist")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

All Rhat values are below 1.05, suggesting the chains have converged.

neff_hist (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

copper.brm3b |> mcmc_plot(type = "neff_hist")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Ratios all very high.

More diagnostics

copper.brm3b |> mcmc_plot(type = "combo", pars = pars)

Warning: Argument 'pars' is deprecated. Please use 'variable' instead.

copper.brm3b |> mcmc_plot(type = "violin", pars = pars)

Warning: Argument 'pars' is deprecated. Please use 'variable' instead.

The rstan package offers a range of MCMC diagnostics. Lets start with the MCMC diagnostics.

Of these, we will focus on:

stan_trace: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

copper.brm3 |> get_variables()

 [1] "b_Intercept"            "b_COPPERWeek1"          "b_COPPERWeek2"         
 [4] "b_DIST2"                "b_DIST3"                "b_DIST4"               
 [7] "b_COPPERWeek1:DIST2"    "b_COPPERWeek2:DIST2"    "b_COPPERWeek1:DIST3"   
[10] "b_COPPERWeek2:DIST3"    "b_COPPERWeek1:DIST4"    "b_COPPERWeek2:DIST4"   
[13] "sd_PLATE__Intercept"    "Intercept"              "r_PLATE[12,Intercept]" 
[16] "r_PLATE[16,Intercept]"  "r_PLATE[21,Intercept]"  "r_PLATE[34,Intercept]" 
[19] "r_PLATE[36,Intercept]"  "r_PLATE[39,Intercept]"  "r_PLATE[57,Intercept]" 
[22] "r_PLATE[61,Intercept]"  "r_PLATE[63,Intercept]"  "r_PLATE[178,Intercept]"
[25] "r_PLATE[179,Intercept]" "r_PLATE[181,Intercept]" "r_PLATE[199,Intercept]"
[28] "r_PLATE[200,Intercept]" "r_PLATE[204,Intercept]" "prior_Intercept"       
[31] "prior_b"                "prior_sd_PLATE"         "lprior"                
[34] "lp__"                   "accept_stat__"          "stepsize__"            
[37] "treedepth__"            "n_leapfrog__"           "divergent__"           
[40] "energy__"

pars <- copper.brm3 |> get_variables()
pars <- str_extract(pars, "^b_.*|^sigma$|^sd.*") |> na.omit()

copper.brm3$fit |>
  stan_trace(pars = pars)

The chains appear well mixed and very similar

stan_acf (auto-correlation function): plots the auto-correlation between successive MCMC sample lags for each parameter and each chain

copper.brm3$fit |>
  stan_ac(pars = pars)

There is no evidence of auto-correlation in the MCMC samples

stan_rhat: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

copper.brm3$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

All Rhat values are below 1.05, suggesting the chains have converged.

stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

copper.brm3$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Ratios all very high.

copper.brm3$fit |>
  stan_dens(separate_chains = TRUE, pars = pars)

The rstan package offers a range of MCMC diagnostics. Lets start with the MCMC diagnostics.

Of these, we will focus on:

stan_trace: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

copper.brm3a |> get_variables()

 [1] "b_Intercept"               "b_COPPERWeek1"            
 [3] "b_COPPERWeek2"             "b_DIST2"                  
 [5] "b_DIST3"                   "b_DIST4"                  
 [7] "b_COPPERWeek1:DIST2"       "b_COPPERWeek2:DIST2"      
 [9] "b_COPPERWeek1:DIST3"       "b_COPPERWeek2:DIST3"      
[11] "b_COPPERWeek1:DIST4"       "b_COPPERWeek2:DIST4"      
[13] "sd_PLATE__Intercept"       "sigma"                    
[15] "Intercept"                 "r_PLATE[12,Intercept]"    
[17] "r_PLATE[16,Intercept]"     "r_PLATE[21,Intercept]"    
[19] "r_PLATE[34,Intercept]"     "r_PLATE[36,Intercept]"    
[21] "r_PLATE[39,Intercept]"     "r_PLATE[57,Intercept]"    
[23] "r_PLATE[61,Intercept]"     "r_PLATE[63,Intercept]"    
[25] "r_PLATE[178,Intercept]"    "r_PLATE[179,Intercept]"   
[27] "r_PLATE[181,Intercept]"    "r_PLATE[199,Intercept]"   
[29] "r_PLATE[200,Intercept]"    "r_PLATE[204,Intercept]"   
[31] "prior_Intercept"           "prior_b_COPPERWeek1"      
[33] "prior_b_COPPERWeek2"       "prior_b_DIST2"            
[35] "prior_b_DIST3"             "prior_b_DIST4"            
[37] "prior_b_COPPERWeek1:DIST2" "prior_b_COPPERWeek2:DIST2"
[39] "prior_b_COPPERWeek1:DIST3" "prior_b_COPPERWeek2:DIST3"
[41] "prior_b_COPPERWeek1:DIST4" "prior_b_COPPERWeek2:DIST4"
[43] "prior_sigma"               "prior_sd_PLATE"           
[45] "lprior"                    "lp__"                     
[47] "accept_stat__"             "stepsize__"               
[49] "treedepth__"               "n_leapfrog__"             
[51] "divergent__"               "energy__"

pars <- copper.brm3a |> get_variables()
pars <- str_extract(pars, "^b_.*|^sigma$|^sd.*") |> na.omit()

copper.brm3a$fit |>
  stan_trace(pars = pars)

The chains appear well mixed and very similar

stan_acf (auto-correlation function): plots the auto-correlation between successive MCMC sample lags for each parameter and each chain

copper.brm3a$fit |>
  stan_ac(pars = pars)

There is no evidence of auto-correlation in the MCMC samples

stan_rhat: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

copper.brm3a$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

All Rhat values are below 1.05, suggesting the chains have converged.

stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

copper.brm3a$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Ratios all very high.

copper.brm3a$fit |>
  stan_dens(separate_chains = TRUE, pars = pars)

The rstan package offers a range of MCMC diagnostics. Lets start with the MCMC diagnostics.

Of these, we will focus on:

stan_trace: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

copper.brm3b |> get_variables()

  [1] "b_Intercept"                 "b_COPPERWeek1"              
  [3] "b_COPPERWeek2"               "b_DIST2"                    
  [5] "b_DIST3"                     "b_DIST4"                    
  [7] "b_COPPERWeek1:DIST2"         "b_COPPERWeek2:DIST2"        
  [9] "b_COPPERWeek1:DIST3"         "b_COPPERWeek2:DIST3"        
 [11] "b_COPPERWeek1:DIST4"         "b_COPPERWeek2:DIST4"        
 [13] "sd_PLATE__Intercept"         "sd_PLATE__DIST2"            
 [15] "sd_PLATE__DIST3"             "sd_PLATE__DIST4"            
 [17] "cor_PLATE__Intercept__DIST2" "cor_PLATE__Intercept__DIST3"
 [19] "cor_PLATE__DIST2__DIST3"     "cor_PLATE__Intercept__DIST4"
 [21] "cor_PLATE__DIST2__DIST4"     "cor_PLATE__DIST3__DIST4"    
 [23] "shape"                       "Intercept"                  
 [25] "r_PLATE[12,Intercept]"       "r_PLATE[16,Intercept]"      
 [27] "r_PLATE[21,Intercept]"       "r_PLATE[34,Intercept]"      
 [29] "r_PLATE[36,Intercept]"       "r_PLATE[39,Intercept]"      
 [31] "r_PLATE[57,Intercept]"       "r_PLATE[61,Intercept]"      
 [33] "r_PLATE[63,Intercept]"       "r_PLATE[178,Intercept]"     
 [35] "r_PLATE[179,Intercept]"      "r_PLATE[181,Intercept]"     
 [37] "r_PLATE[199,Intercept]"      "r_PLATE[200,Intercept]"     
 [39] "r_PLATE[204,Intercept]"      "r_PLATE[12,DIST2]"          
 [41] "r_PLATE[16,DIST2]"           "r_PLATE[21,DIST2]"          
 [43] "r_PLATE[34,DIST2]"           "r_PLATE[36,DIST2]"          
 [45] "r_PLATE[39,DIST2]"           "r_PLATE[57,DIST2]"          
 [47] "r_PLATE[61,DIST2]"           "r_PLATE[63,DIST2]"          
 [49] "r_PLATE[178,DIST2]"          "r_PLATE[179,DIST2]"         
 [51] "r_PLATE[181,DIST2]"          "r_PLATE[199,DIST2]"         
 [53] "r_PLATE[200,DIST2]"          "r_PLATE[204,DIST2]"         
 [55] "r_PLATE[12,DIST3]"           "r_PLATE[16,DIST3]"          
 [57] "r_PLATE[21,DIST3]"           "r_PLATE[34,DIST3]"          
 [59] "r_PLATE[36,DIST3]"           "r_PLATE[39,DIST3]"          
 [61] "r_PLATE[57,DIST3]"           "r_PLATE[61,DIST3]"          
 [63] "r_PLATE[63,DIST3]"           "r_PLATE[178,DIST3]"         
 [65] "r_PLATE[179,DIST3]"          "r_PLATE[181,DIST3]"         
 [67] "r_PLATE[199,DIST3]"          "r_PLATE[200,DIST3]"         
 [69] "r_PLATE[204,DIST3]"          "r_PLATE[12,DIST4]"          
 [71] "r_PLATE[16,DIST4]"           "r_PLATE[21,DIST4]"          
 [73] "r_PLATE[34,DIST4]"           "r_PLATE[36,DIST4]"          
 [75] "r_PLATE[39,DIST4]"           "r_PLATE[57,DIST4]"          
 [77] "r_PLATE[61,DIST4]"           "r_PLATE[63,DIST4]"          
 [79] "r_PLATE[178,DIST4]"          "r_PLATE[179,DIST4]"         
 [81] "r_PLATE[181,DIST4]"          "r_PLATE[199,DIST4]"         
 [83] "r_PLATE[200,DIST4]"          "r_PLATE[204,DIST4]"         
 [85] "prior_Intercept"             "prior_b_COPPERWeek1"        
 [87] "prior_b_COPPERWeek2"         "prior_b_DIST2"              
 [89] "prior_b_DIST3"               "prior_b_DIST4"              
 [91] "prior_b_COPPERWeek1:DIST2"   "prior_b_COPPERWeek2:DIST2"  
 [93] "prior_b_COPPERWeek1:DIST3"   "prior_b_COPPERWeek2:DIST3"  
 [95] "prior_b_COPPERWeek1:DIST4"   "prior_b_COPPERWeek2:DIST4"  
 [97] "prior_shape"                 "prior_sd_PLATE"             
 [99] "prior_cor_PLATE"             "lprior"                     
[101] "lp__"                        "accept_stat__"              
[103] "stepsize__"                  "treedepth__"                
[105] "n_leapfrog__"                "divergent__"                
[107] "energy__"

pars <- copper.brm3b |> get_variables()
pars <- str_extract(pars, "^b_.*|^sigma$|^sd.*") |> na.omit()

copper.brm3b$fit |>
  stan_trace(pars = pars)

The chains appear well mixed and very similar

stan_acf (auto-correlation function): plots the auto-correlation between successive MCMC sample lags for each parameter and each chain

copper.brm3b$fit |>
  stan_ac(pars = pars)

There is no evidence of auto-correlation in the MCMC samples

stan_rhat: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

copper.brm3b$fit |> stan_rhat()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

All Rhat values are below 1.05, suggesting the chains have converged.

stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

copper.brm3b$fit |> stan_ess()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Ratios all very high.

copper.brm3b$fit |>
  stan_dens(separate_chains = TRUE, pars = pars)

The ggmean package also as a set of MCMC diagnostic functions. Lets start with the MCMC diagnostics.

Of these, we will focus on:

ggs_traceplot: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).

## owls.ggs <- copper.brm3 |> ggs(burnin = FALSE, inc_warmup = FALSE)
## copper.ggs |> ggs_traceplot()

The chains appear well mixed and very similar

gss_autocorrelation (autocorrelation function): plots the autocorrelation between successive MCMC sample lags for each parameter and each chain

## ggs_autocorrelation(copper.ggs)

There is no evidence of auto-correlation in the MCMC samples

stan_rhat: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.

## ggs_Rhat(copper.ggs)

All Rhat values are below 1.05, suggesting the chains have converged.

stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

If the ratios are low, tightening the priors may help.

## ggs_effective(copper.ggs)

Ratios all very high.

More diagnostics

## ggs_crosscorrelation(owls.ggs)

## ggs_grb(owls.ggs)

8 Model validation

brms

Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.

available_ppc()

dens_overlay: plots the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.

copper.brm3 |> pp_check(type = "dens_overlay", ndraws = 100)

The model draws appear to differ substantially from the observed data.

error_scatter_avg: this plots the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. Note, this is not really that useful for models that involve a binomial response

## copper.brm3 |> pp_check(type = 'error_scatter_avg')

This is not really interpretable

intervals: plots the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

copper.brm3 |> pp_check(group = "Nest", type = "intervals")

Using all posterior draws for ppc type 'intervals' by default.

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

# library(shinystan)
# launch_shinystan(copper.brm2)

Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.

available_ppc()

dens_overlay: plots the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.

copper.brm3a |> pp_check(type = "dens_overlay", ndraws = 100)

The model draws appear to differ substantially from the observed data.

error_scatter_avg: this plots the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. Note, this is not really that useful for models that involve a binomial response

## copper.brm3a |> pp_check(type = 'error_scatter_avg')

This is not really interpretable

intervals: plots the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

copper.brm3a |> pp_check(group = "Nest", type = "intervals")

Using all posterior draws for ppc type 'intervals' by default.

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

# library(shinystan)
# launch_shinystan(copper.brm2)

Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.

available_ppc()

dens_overlay: plots the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.

copper.brm3b |> pp_check(type = "dens_overlay", ndraws = 100)

The model draws appear to differ substantially from the observed data.

error_scatter_avg: this plots the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. Note, this is not really that useful for models that involve a binomial response

## copper.brm3b |> pp_check(type = 'error_scatter_avg')

This is not really interpretable

intervals: plots the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.

copper.brm3b |> pp_check(group = "Nest", type = "intervals")

Using all posterior draws for ppc type 'intervals' by default.

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

# library(shinystan)
# launch_shinystan(copper.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

simulated (predicted) responses associated with each observation.
observed values
fitted (predicted) responses (averaged) associated with each observation

copper.resids <- make_brms_dharma_res(copper.brm3, integerResponse = TRUE)
wrap_elements(~ testUniformity(copper.resids)) +
  wrap_elements(~ plotResiduals(copper.resids, form = factor(rep(1, nrow(copper))))) +
  wrap_elements(~ plotResiduals(copper.resids, quantreg = TRUE)) +
  wrap_elements(~ testDispersion(copper.resids))

Conclusions:

the model appears to be a good fit
the Q-Q plot is a straight line
there are no outliers

Perhaps we should specifically explore zero-inflation.

copper.resids |> testZeroInflation()


    DHARMa zero-inflation test via comparison to expected zeros with
    simulation under H0 = fitted model

data:  simulationOutput
ratioObsSim = 2.9644, p-value = 0.192
alternative hypothesis: two.sided

Conclusions:

there is no evidence of zero-inflation

## preds <- copper.brm3a |> posterior_predict(nsamples = 250,  summary = FALSE)
## copper.resids <- createDHARMa(simulatedResponse = t(preds),
##                             observedResponse = copper$NCalls,
##                             fittedPredictedResponse = apply(preds, 2, median),
##                             integerResponse = TRUE)
## plot(copper.resids)

copper.resids <- make_brms_dharma_res(copper.brm3a, integerResponse = TRUE)
wrap_elements(~ testUniformity(copper.resids)) +
  wrap_elements(~ plotResiduals(copper.resids, form = factor(rep(1, nrow(copper))))) +
  wrap_elements(~ plotResiduals(copper.resids, quantreg = TRUE)) +
  wrap_elements(~ testDispersion(copper.resids))

Conclusions:

the model does not appear to be a very good fit
the Q-Q plot deviates substantially from a straight line
there are outliers

Perhaps we should specifically explore zero-inflation.

copper.resids |> testZeroInflation()


    DHARMa zero-inflation test via comparison to expected zeros with
    simulation under H0 = fitted model

data:  simulationOutput
ratioObsSim = NaN, p-value = 1
alternative hypothesis: two.sided

Conclusions:

there is strong evidence of zero-inflation

## preds <- copper.brm3b |> posterior_predict(nsamples = 250,  summary = FALSE)
## copper.resids <- createDHARMa(simulatedResponse = t(preds),
##                             observedResponse = copper$NCalls,
##                             fittedPredictedResponse = apply(preds, 2, median),
##                             integerResponse = TRUE)
## plot(copper.resids)

copper.resids <- make_brms_dharma_res(copper.brm3b, integerResponse = TRUE)
wrap_elements(~ testUniformity(copper.resids)) +
  wrap_elements(~ plotResiduals(copper.resids, form = factor(rep(1, nrow(copper))))) +
  wrap_elements(~ plotResiduals(copper.resids, quantreg = TRUE)) +
  wrap_elements(~ testDispersion(copper.resids))

Conclusions:

the model does not appear to be a very good fit
the Q-Q plot deviates substantially from a straight line
there are outliers

Perhaps we should specifically explore zero-inflation.

copper.resids |> testZeroInflation()


    DHARMa zero-inflation test via comparison to expected zeros with
    simulation under H0 = fitted model

data:  simulationOutput
ratioObsSim = NaN, p-value = 1
alternative hypothesis: two.sided

Conclusions:

there is strong evidence of zero-inflation

Conclusions:

there is evidence that the model does not fit that well. It is evidently zero inflated and possibly also over-dispersed.
it would seem that a zero-inflated Poisson or even a zero-inflated Negative Binomial would be a sensible next step.
zero-inflated models cannot be fit in glmer(), so we will proceed with glmmTMB() only.

We will go with the Poisson model

9 Partial effects plots

brms

Note the scale of the predictions are in units of counts NOT density!

copper.brm3 |>
  conditional_effects("COPPER:DIST") |>
  plot(points = TRUE, jitter_width = 0.25)

Warning: 'jitter_width' is deprecated. Please use 'point_args = list(width =
<width>)' instead.

copper.brm3 |>
  ggpredict(~ COPPER * DIST) |>
  plot(show_data = TRUE, jitter = c(0.25, 0))

Note: uncertainty of error terms are not taken into account. Consider
  setting `interval` to "prediction". This will call `posterior_predict()`
  instead of `posterior_epred()`.

ggemmeans() can accommodate the offset correctly. There are two sensible choices:

copper.brm3 |>
  ggemmeans(~ COPPER * DIST) |>
  plot()

10 Model investigation

brms

The summary() method generates simple summaries (mean, standard deviation as well as 10, 50 and 90 percentiles).

copper.brm3 |> summary()

 Family: poisson 
  Links: mu = log 
Formula: COUNT ~ offset(log(AREA)) + COPPER * DIST + (1 | PLATE) 
   Data: copper (Number of observations: 60) 
  Draws: 3 chains, each with iter = 5000; warmup = 2500; thin = 10;
         total post-warmup draws = 750

Multilevel Hyperparameters:
~PLATE (Number of levels: 15) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.06      0.04     0.00     0.15 1.00      725      783

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept             2.36      0.08     2.21     2.52 1.00      797      722
COPPERWeek1          -0.38      0.11    -0.61    -0.15 1.00      798      849
COPPERWeek2          -3.19      0.31    -3.82    -2.63 1.00      723      740
DIST2                 0.12      0.10    -0.06     0.31 1.00      751      661
DIST3                 0.16      0.10    -0.03     0.35 1.01      875      624
DIST4                 0.25      0.09     0.06     0.41 1.00      785      717
COPPERWeek1:DIST2     0.01      0.14    -0.27     0.29 1.00      767      757
COPPERWeek2:DIST2     1.05      0.36     0.40     1.78 1.00      763      692
COPPERWeek1:DIST3    -0.00      0.15    -0.28     0.29 1.00      849      659
COPPERWeek2:DIST3     2.03      0.34     1.42     2.76 1.00      761      725
COPPERWeek1:DIST4     0.08      0.14    -0.17     0.36 1.00      840      679
COPPERWeek2:DIST4     2.63      0.32     2.04     3.33 1.00      740      687

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conclusions:

the average density of worms in the Dist 1 position on control plates 2.36 (on the link scale). When back-transformed, this is 10.57.
on average, there are 0 (link scale) fewer worms in the Dist 1 position of Week 1 plates (than control plates). This equates to 0.68 fold fewer worms and represents a 32% decline.
on average, there are -3 (link scale) fewer worms in the Dist 1 position of Week 2 plates (than control plates). This equates to 0.04 fold fewer worms and represents a 96% decline.
on average, there are 0 (link scale) more worms on Dist 2 (than Dist 1). This equates to 1.13 fold increase and represents a -13% increase (although this is not significant).
there is no evidence that the Week1 patterns differ from those of the control however, there is evidence that Week 2 patterns are different from those of the control plates.

# A draws_df: 250 iterations, 3 chains, and 34 variables
   b_Intercept b_COPPERWeek1 b_COPPERWeek2 b_DIST2 b_DIST3 b_DIST4
1          2.2         -0.12          -2.8   0.241   0.403    0.47
2          2.4         -0.49          -3.4   0.119   0.027    0.30
3          2.3         -0.29          -3.5   0.222   0.070    0.33
4          2.4         -0.38          -3.0   0.016   0.049    0.17
5          2.3         -0.46          -2.6   0.149   0.314    0.40
6          2.3         -0.23          -3.2   0.223   0.117    0.33
7          2.4         -0.35          -3.0   0.171   0.056    0.14
8          2.3         -0.30          -3.2   0.239   0.202    0.15
9          2.2         -0.13          -2.4   0.285   0.339    0.45
10         2.2         -0.31          -3.2   0.311   0.294    0.29
   b_COPPERWeek1:DIST2 b_COPPERWeek2:DIST2
1               -0.060                0.79
2                0.109                1.26
3               -0.163                1.58
4               -0.095                0.84
5                0.173                0.95
6               -0.108                0.77
7               -0.107                0.45
8               -0.232                0.98
9               -0.164                0.40
10              -0.036                0.81
# ... with 740 more draws, and 26 more variables
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}

# A tibble: 34 × 10
   variable    median  lower   upper     Pl    Pg  rhat length ess_bulk ess_tail
   <chr>        <dbl>  <dbl>   <dbl>  <dbl> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
 1 b_Interce… 10.5    8.96   12.2    0      1     1.00     750     797.     722.
 2 b_COPPERW…  0.684  0.557   0.866  1      0     1.000    750     798.     849.
 3 b_COPPERW…  0.0415 0.0215  0.0706 1      0     1.000    750     723.     740.
 4 b_DIST2     1.13   0.935   1.35   0.0987 0.901 1.00     750     752.     661.
 5 b_DIST3     1.18   0.960   1.39   0.0507 0.949 1.01     750     875.     624.
 6 b_DIST4     1.28   1.07    1.50   0.004  0.996 0.998    750     785.     717.
 7 b_COPPERW…  1.01   0.746   1.30   0.487  0.513 1.00     750     767.     757.
 8 b_COPPERW…  2.80   1.29    5.20   0      1     0.999    750     763.     692.
 9 b_COPPERW…  1.00   0.757   1.33   0.496  0.504 1.00     750     849.     659.
10 b_COPPERW…  7.53   3.86   14.1    0      1     0.999    750     760.     725.
# ℹ 24 more rows

copper.brm3$fit |>
  tidyMCMC(
    estimate.method = "median",
    conf.int = TRUE, conf.method = "HPDinterval",
    rhat = TRUE, ess = TRUE
  )

# A tibble: 33 × 7
   term                 estimate std.error conf.low conf.high  rhat   ess
   <chr>                   <dbl>     <dbl>    <dbl>     <dbl> <dbl> <int>
 1 b_Intercept          2.36        0.0771   2.20       2.51  1.00    790
 2 b_COPPERWeek1       -0.379       0.114   -0.584     -0.144 1.000   788
 3 b_COPPERWeek2       -3.19        0.307   -3.80      -2.61  0.999   726
 4 b_DIST2              0.125       0.0952  -0.0427     0.322 0.998   716
 5 b_DIST3              0.160       0.0958  -0.0409     0.330 1.00    878
 6 b_DIST4              0.245       0.0910   0.0633     0.406 0.998   800
 7 b_COPPERWeek1:DIST2  0.0125      0.139   -0.292      0.265 1.00    751
 8 b_COPPERWeek2:DIST2  1.05        0.355    0.386      1.77  0.998   742
 9 b_COPPERWeek1:DIST3 -0.000773    0.151   -0.275      0.289 1.00    841
10 b_COPPERWeek2:DIST3  2.03        0.337    1.35       2.65  0.999   761
# ℹ 23 more rows

Conclusions:

see above

Due to the presence of a log transform in the predictor, it is better to use the regex version.

copper.brm3 |> get_variables()

 [1] "b_Intercept"            "b_COPPERWeek1"          "b_COPPERWeek2"         
 [4] "b_DIST2"                "b_DIST3"                "b_DIST4"               
 [7] "b_COPPERWeek1:DIST2"    "b_COPPERWeek2:DIST2"    "b_COPPERWeek1:DIST3"   
[10] "b_COPPERWeek2:DIST3"    "b_COPPERWeek1:DIST4"    "b_COPPERWeek2:DIST4"   
[13] "sd_PLATE__Intercept"    "Intercept"              "r_PLATE[12,Intercept]" 
[16] "r_PLATE[16,Intercept]"  "r_PLATE[21,Intercept]"  "r_PLATE[34,Intercept]" 
[19] "r_PLATE[36,Intercept]"  "r_PLATE[39,Intercept]"  "r_PLATE[57,Intercept]" 
[22] "r_PLATE[61,Intercept]"  "r_PLATE[63,Intercept]"  "r_PLATE[178,Intercept]"
[25] "r_PLATE[179,Intercept]" "r_PLATE[181,Intercept]" "r_PLATE[199,Intercept]"
[28] "r_PLATE[200,Intercept]" "r_PLATE[204,Intercept]" "prior_Intercept"       
[31] "prior_b"                "prior_sd_PLATE"         "lprior"                
[34] "lp__"                   "accept_stat__"          "stepsize__"            
[37] "treedepth__"            "n_leapfrog__"           "divergent__"           
[40] "energy__"

copper.draw <- copper.brm3 |>
  gather_draws(`b.Intercept.*|b_COPPER.*|b_DIST.*`, regex = TRUE)
copper.draw

# A tibble: 9,000 × 5
# Groups:   .variable [12]
   .chain .iteration .draw .variable   .value
    <int>      <int> <int> <chr>        <dbl>
 1      1          1     1 b_Intercept   2.16
 2      1          2     2 b_Intercept   2.44
 3      1          3     3 b_Intercept   2.31
 4      1          4     4 b_Intercept   2.42
 5      1          5     5 b_Intercept   2.29
 6      1          6     6 b_Intercept   2.30
 7      1          7     7 b_Intercept   2.40
 8      1          8     8 b_Intercept   2.32
 9      1          9     9 b_Intercept   2.16
10      1         10    10 b_Intercept   2.19
# ℹ 8,990 more rows

We can then summarise this

copper.draw |> median_hdci()

# A tibble: 12 × 7
   .variable             .value  .lower .upper .width .point .interval
   <chr>                  <dbl>   <dbl>  <dbl>  <dbl> <chr>  <chr>    
 1 b_COPPERWeek1       -0.380   -0.572  -0.127   0.95 median hdci     
 2 b_COPPERWeek1:DIST2  0.00622 -0.281   0.282   0.95 median hdci     
 3 b_COPPERWeek1:DIST3  0.00132 -0.277   0.289   0.95 median hdci     
 4 b_COPPERWeek1:DIST4  0.0730  -0.176   0.361   0.95 median hdci     
 5 b_COPPERWeek2       -3.18    -3.80   -2.61    0.95 median hdci     
 6 b_COPPERWeek2:DIST2  1.03     0.386   1.77    0.95 median hdci     
 7 b_COPPERWeek2:DIST3  2.02     1.46    2.79    0.95 median hdci     
 8 b_COPPERWeek2:DIST4  2.62     2.01    3.27    0.95 median hdci     
 9 b_DIST2              0.126   -0.0428  0.322   0.95 median hdci     
10 b_DIST3              0.165   -0.0409  0.330   0.95 median hdci     
11 b_DIST4              0.247    0.0626  0.406   0.95 median hdci     
12 b_Intercept          2.35     2.20    2.51    0.95 median hdci

## On a fractional scale
copper.draw |>
  mutate(.value = exp(.value)) |>
  median_hdci()

# A tibble: 12 × 7
   .variable            .value .lower  .upper .width .point .interval
   <chr>                 <dbl>  <dbl>   <dbl>  <dbl> <chr>  <chr>    
 1 b_COPPERWeek1        0.684  0.557   0.866    0.95 median hdci     
 2 b_COPPERWeek1:DIST2  1.01   0.737   1.30     0.95 median hdci     
 3 b_COPPERWeek1:DIST3  1.00   0.757   1.33     0.95 median hdci     
 4 b_COPPERWeek1:DIST4  1.08   0.803   1.40     0.95 median hdci     
 5 b_COPPERWeek2        0.0415 0.0215  0.0708   0.95 median hdci     
 6 b_COPPERWeek2:DIST2  2.80   1.27    5.20     0.95 median hdci     
 7 b_COPPERWeek2:DIST3  7.53   3.81   14.1      0.95 median hdci     
 8 b_COPPERWeek2:DIST4 13.8    7.37   26.4      0.95 median hdci     
 9 b_DIST2              1.13   0.931   1.35     0.95 median hdci     
10 b_DIST3              1.18   0.955   1.39     0.95 median hdci     
11 b_DIST4              1.28   1.06    1.50     0.95 median hdci     
12 b_Intercept         10.5    9.05   12.3      0.95 median hdci

Lets plot the parameter posteriors (on the link scale - since we are including the intercept).

copper.brm3 |>
  gather_draws(`b_Intercept.*|b_COPPER.*|b_DIST.*`, regex = TRUE) |>
  ## mutate(.value = exp(.value)) |>
  ggplot() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  stat_slab(aes(
    x = .value, y = .variable,
    fill = stat(ggdist::cut_cdf_qi(cdf,
      .width = c(0.5, 0.8, 0.95),
      labels = scales::percent_format()
    ))
  ), color = "black") +
  scale_fill_brewer("Interval", direction = -1, na.translate = FALSE)

Warning: `stat(ggdist::cut_cdf_qi(cdf, .width = c(0.5, 0.8, 0.95), labels =
scales::percent_format()))` was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(ggdist::cut_cdf_qi(cdf, .width = c(0.5, 0.8, 0.95),
  labels = scales::percent_format()))` instead.

copper.brm3 |>
  gather_draws(`.Intercept.*|b_COPPER.*|b_DIST.*`, regex = TRUE) |>
  ggplot() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  stat_halfeye(aes(x = .value, y = .variable)) +
  theme_classic()

copper.brm3$fit |> plot(type = "intervals")

'pars' not specified. Showing first 10 parameters by default.

ci_level: 0.8 (80% intervals)

outer_level: 0.95 (95% intervals)

copper.brm3 |>
  gather_draws(`^b_.*`, regex = TRUE) |>
  filter(.variable != "b_Intercept") |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable)) +
  facet_wrap(~.variable, scales = "free") +
  theme(axis.text.y = element_blank())

copper.brm3 |>
  gather_draws(`^b_.*`, regex = TRUE) |>
  filter(.variable != "b_Intercept") |>
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable)) +
  geom_vline(xintercept = 0, linetype = "dashed")

copper.brm3 |>
  gather_draws(`^b_.*`, regex = TRUE) |>
  filter(str_detect(.variable, "b_.*Intercept", negate = TRUE)) |>
  ggplot() +
  geom_density_ridges(aes(x = .value, y = .variable), alpha = 0.4) +
  geom_vline(xintercept = 0, linetype = "dashed")

Picking joint bandwidth of 0.0458

## Or in colour
copper.brm3 |>
  gather_draws(`^b_.*`, regex = TRUE) |>
  filter(str_detect(.variable, "b_.*Intercept", negate = TRUE)) |>
  ggplot() +
  geom_density_ridges_gradient(
    aes(
      x = (.value),
      y = .variable,
      fill = stat(x)
    ),
    alpha = 0.4, colour = "white",
    quantile_lines = TRUE,
    quantiles = c(0.025, 0.975)
  ) +
  geom_vline(xintercept = 1, linetype = "dashed") +
  scale_x_continuous() +
  scale_fill_viridis_c(option = "C")

Picking joint bandwidth of 0.0458

## Fractional scale
copper.brm3 |>
  gather_draws(`^b_.*`, regex = TRUE) |>
  filter(str_detect(.variable, "b_.*Intercept", negate = TRUE)) |>
  ggplot() +
  geom_density_ridges_gradient(
    aes(
      x = exp(.value),
      y = .variable,
      fill = stat(x)
    ),
    alpha = 0.4, colour = "white",
    quantile_lines = TRUE,
    quantiles = c(0.025, 0.975)
  ) +
  geom_vline(xintercept = 1, linetype = "dashed") +
  scale_x_continuous(trans = scales::log2_trans()) +
  scale_fill_viridis_c(option = "C")

Picking joint bandwidth of 0.0661

This is purely a graphical depiction on the posteriors.

copper.brm3 |> tidy_draws()

# A tibble: 750 × 43
   .chain .iteration .draw b_Intercept b_COPPERWeek1 b_COPPERWeek2 b_DIST2
    <int>      <int> <int>       <dbl>         <dbl>         <dbl>   <dbl>
 1      1          1     1        2.16        -0.123         -2.78  0.241 
 2      1          2     2        2.44        -0.491         -3.42  0.119 
 3      1          3     3        2.31        -0.287         -3.46  0.222 
 4      1          4     4        2.42        -0.383         -2.98  0.0157
 5      1          5     5        2.29        -0.455         -2.61  0.149 
 6      1          6     6        2.30        -0.232         -3.24  0.223 
 7      1          7     7        2.40        -0.347         -3.02  0.171 
 8      1          8     8        2.32        -0.301         -3.23  0.239 
 9      1          9     9        2.16        -0.127         -2.45  0.285 
10      1         10    10        2.19        -0.313         -3.24  0.311 
# ℹ 740 more rows
# ℹ 36 more variables: b_DIST3 <dbl>, b_DIST4 <dbl>,
#   `b_COPPERWeek1:DIST2` <dbl>, `b_COPPERWeek2:DIST2` <dbl>,
#   `b_COPPERWeek1:DIST3` <dbl>, `b_COPPERWeek2:DIST3` <dbl>,
#   `b_COPPERWeek1:DIST4` <dbl>, `b_COPPERWeek2:DIST4` <dbl>,
#   sd_PLATE__Intercept <dbl>, Intercept <dbl>, `r_PLATE[12,Intercept]` <dbl>,
#   `r_PLATE[16,Intercept]` <dbl>, `r_PLATE[21,Intercept]` <dbl>, …

copper.brm3 |> spread_draws(`.*Intercept.*|b_COPPER.*|b_DIST.*`, regex = TRUE)

# A tibble: 750 × 33
   .chain .iteration .draw b_Intercept b_COPPERWeek1 b_COPPERWeek2 b_DIST2
    <int>      <int> <int>       <dbl>         <dbl>         <dbl>   <dbl>
 1      1          1     1        2.16        -0.123         -2.78  0.241 
 2      1          2     2        2.44        -0.491         -3.42  0.119 
 3      1          3     3        2.31        -0.287         -3.46  0.222 
 4      1          4     4        2.42        -0.383         -2.98  0.0157
 5      1          5     5        2.29        -0.455         -2.61  0.149 
 6      1          6     6        2.30        -0.232         -3.24  0.223 
 7      1          7     7        2.40        -0.347         -3.02  0.171 
 8      1          8     8        2.32        -0.301         -3.23  0.239 
 9      1          9     9        2.16        -0.127         -2.45  0.285 
10      1         10    10        2.19        -0.313         -3.24  0.311 
# ℹ 740 more rows
# ℹ 26 more variables: b_DIST3 <dbl>, b_DIST4 <dbl>,
#   `b_COPPERWeek1:DIST2` <dbl>, `b_COPPERWeek2:DIST2` <dbl>,
#   `b_COPPERWeek1:DIST3` <dbl>, `b_COPPERWeek2:DIST3` <dbl>,
#   `b_COPPERWeek1:DIST4` <dbl>, `b_COPPERWeek2:DIST4` <dbl>,
#   sd_PLATE__Intercept <dbl>, Intercept <dbl>, `r_PLATE[12,Intercept]` <dbl>,
#   `r_PLATE[16,Intercept]` <dbl>, `r_PLATE[21,Intercept]` <dbl>, …

copper.brm3 |>
  posterior_samples() |>
  as_tibble()

Warning: Method 'posterior_samples' is deprecated. Please see ?as_draws for
recommended alternatives.

# A tibble: 750 × 34
   b_Intercept b_COPPERWeek1 b_COPPERWeek2 b_DIST2 b_DIST3 b_DIST4
         <dbl>         <dbl>         <dbl>   <dbl>   <dbl>   <dbl>
 1        2.16        -0.123         -2.78  0.241   0.403    0.473
 2        2.44        -0.491         -3.42  0.119   0.0274   0.305
 3        2.31        -0.287         -3.46  0.222   0.0696   0.335
 4        2.42        -0.383         -2.98  0.0157  0.0491   0.175
 5        2.29        -0.455         -2.61  0.149   0.314    0.399
 6        2.30        -0.232         -3.24  0.223   0.117    0.327
 7        2.40        -0.347         -3.02  0.171   0.0555   0.143
 8        2.32        -0.301         -3.23  0.239   0.202    0.149
 9        2.16        -0.127         -2.45  0.285   0.339    0.450
10        2.19        -0.313         -3.24  0.311   0.294    0.290
# ℹ 740 more rows
# ℹ 28 more variables: `b_COPPERWeek1:DIST2` <dbl>,
#   `b_COPPERWeek2:DIST2` <dbl>, `b_COPPERWeek1:DIST3` <dbl>,
#   `b_COPPERWeek2:DIST3` <dbl>, `b_COPPERWeek1:DIST4` <dbl>,
#   `b_COPPERWeek2:DIST4` <dbl>, sd_PLATE__Intercept <dbl>, Intercept <dbl>,
#   `r_PLATE[12,Intercept]` <dbl>, `r_PLATE[16,Intercept]` <dbl>,
#   `r_PLATE[21,Intercept]` <dbl>, `r_PLATE[34,Intercept]` <dbl>, …

copper.brm3 |>
  bayes_R2(re.form = NA, summary = FALSE) |>
  median_hdci()

          y    ymin    ymax .width .point .interval
1 0.8889454 0.85973 0.90625   0.95 median      hdci

copper.brm3 |>
  bayes_R2(re.form = ~ (1 | PLATE), summary = FALSE) |>
  median_hdci()

         y      ymin      ymax .width .point .interval
1 0.901064 0.8752712 0.9231022   0.95 median      hdci

copper.brm3 |>
  bayes_R2(re.form = ~ (COPPER * DIST | PLATE), summary = FALSE) |>
  median_hdci()

          y    ymin    ymax .width .point .interval
1 0.8889454 0.85973 0.90625   0.95 median      hdci

copper.brm3 |> modelsummary(
  statistic = c("conf.low", "conf.high"),
  shape = term ~ statistic,
  exponentiate = TRUE
)

copper.brm3 |> modelplot(exponentiate = TRUE)

11 Further investigations

brms

newdata <- copper.brm3 |>
  emmeans(~ COPPER | DIST, type = "response") |>
  as.data.frame()

Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.

head(newdata)

 COPPER  DIST      rate lower.HPD upper.HPD
 control 1    10.537241  8.961209 12.167349
 Week 1  1     7.250791  6.078546  8.585132
 Week 2  1     0.444549  0.216191  0.723366
 control 2    11.984834 10.311718 13.780619
 Week 1  2     8.270779  7.073078  9.647055
 Week 2  2     1.421635  0.894452  1.963382

Point estimate displayed: median 
Results are back-transformed from the log scale 
HPD interval probability: 0.95

ggplot(newdata) +
  geom_pointrange(
    aes(
      y = rate, x = COPPER, color = DIST,
      ymin = lower.HPD, ymax = upper.HPD
    ),
    position = position_dodge(width = 0.2)
  ) +
  theme_classic()

copper.brm3 |>
  emmeans(~ COPPER | DIST, type = "response") |>
  pairs()

Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.

DIST = 1:
 contrast         ratio lower.HPD upper.HPD
 control / Week 1  1.46      1.14      1.77
 control / Week 2 24.10     12.90     43.31
 Week 1 / Week 2  16.28      8.52     29.95

DIST = 2:
 contrast         ratio lower.HPD upper.HPD
 control / Week 1  1.44      1.15      1.75
 control / Week 2  8.40      5.27     12.27
 Week 1 / Week 2   5.87      3.87      8.47

DIST = 3:
 contrast         ratio lower.HPD upper.HPD
 control / Week 1  1.47      1.17      1.79
 control / Week 2  3.17      2.39      4.08
 Week 1 / Week 2   2.18      1.62      2.83

DIST = 4:
 contrast         ratio lower.HPD upper.HPD
 control / Week 1  1.35      1.06      1.62
 control / Week 2  1.74      1.43      2.15
 Week 1 / Week 2   1.29      1.03      1.58

Point estimate displayed: median 
Results are back-transformed from the log scale 
HPD interval probability: 0.95

copper.brm3 |>
  emmeans(~ COPPER | DIST, type = "response") |>
  pairs() |>
  tidy_draws() |>
  mutate(across(everything(), exp)) |>
  summarise_draws(median, HDInterval::hdi,
    Pl = ~ mean(.x < 1), Pg = ~ mean(.x > 1)
  )

Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.

# A tibble: 12 × 6
   variable                         median lower upper      Pl    Pg
   <chr>                             <dbl> <dbl> <dbl>   <dbl> <dbl>
 1 contrast control - Week 1 DIST 1   1.46  1.14  1.77 0       1    
 2 contrast control - Week 2 DIST 1  24.1  12.9  43.3  0       1    
 3 contrast Week 1 - Week 2 DIST 1   16.3   8.52 30.0  0       1    
 4 contrast control - Week 1 DIST 2   1.44  1.15  1.75 0       1    
 5 contrast control - Week 2 DIST 2   8.40  5.27 12.3  0       1    
 6 contrast Week 1 - Week 2 DIST 2    5.87  3.87  8.47 0       1    
 7 contrast control - Week 1 DIST 3   1.47  1.17  1.79 0       1    
 8 contrast control - Week 2 DIST 3   3.17  2.39  4.08 0       1    
 9 contrast Week 1 - Week 2 DIST 3    2.18  1.62  2.83 0       1    
10 contrast control - Week 1 DIST 4   1.35  1.06  1.62 0.00267 0.997
11 contrast control - Week 2 DIST 4   1.74  1.43  2.15 0       1    
12 contrast Week 1 - Week 2 DIST 4    1.29  1.03  1.58 0.012   0.988

copper.brm3 |>
  emmeans(~ DIST | COPPER, type = "response") |>
  pairs()

Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.

COPPER = control:
 contrast       ratio lower.HPD upper.HPD
 DIST1 / DIST2 0.8814    0.7250     1.044
 DIST1 / DIST3 0.8477    0.7185     1.041
 DIST1 / DIST4 0.7811    0.6666     0.939
 DIST2 / DIST3 0.9587    0.8089     1.150
 DIST2 / DIST4 0.8906    0.7515     1.052
 DIST3 / DIST4 0.9192    0.7714     1.076

COPPER = Week 1:
 contrast       ratio lower.HPD upper.HPD
 DIST1 / DIST2 0.8782    0.7051     1.061
 DIST1 / DIST3 0.8487    0.6658     1.051
 DIST1 / DIST4 0.7276    0.5689     0.866
 DIST2 / DIST3 0.9809    0.7765     1.175
 DIST2 / DIST4 0.8252    0.6686     0.988
 DIST3 / DIST4 0.8518    0.6799     1.031

COPPER = Week 2:
 contrast       ratio lower.HPD upper.HPD
 DIST1 / DIST2 0.3176    0.1354     0.557
 DIST1 / DIST3 0.1145    0.0490     0.189
 DIST1 / DIST4 0.0566    0.0243     0.093
 DIST2 / DIST3 0.3640    0.2394     0.533
 DIST2 / DIST4 0.1827    0.1134     0.262
 DIST3 / DIST4 0.5015    0.3944     0.648

Point estimate displayed: median 
Results are back-transformed from the log scale 
HPD interval probability: 0.95

copper.brm3 |>
  emmeans(~ DIST | COPPER, type = "response") |>
  pairs() |>
  tidy_draws() |>
  mutate(across(everything(), exp)) |>
  summarise_draws(median, HDInterval::hdi,
    Pl = ~ mean(.x < 1), Pg = ~ mean(.x > 1)
  )

Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.

# A tibble: 18 × 6
   variable                              median  lower  upper    Pl     Pg
   <chr>                                  <dbl>  <dbl>  <dbl> <dbl>  <dbl>
 1 contrast DIST1 - DIST2 COPPER control 0.881  0.725  1.04   0.901 0.0987
 2 contrast DIST1 - DIST3 COPPER control 0.848  0.719  1.04   0.949 0.0507
 3 contrast DIST1 - DIST4 COPPER control 0.781  0.667  0.939  0.996 0.004 
 4 contrast DIST2 - DIST3 COPPER control 0.959  0.809  1.15   0.641 0.359 
 5 contrast DIST2 - DIST4 COPPER control 0.891  0.752  1.05   0.924 0.076 
 6 contrast DIST3 - DIST4 COPPER control 0.919  0.771  1.08   0.831 0.169 
 7 contrast DIST1 - DIST2 COPPER Week 1  0.878  0.705  1.06   0.903 0.0973
 8 contrast DIST1 - DIST3 COPPER Week 1  0.849  0.666  1.05   0.912 0.088 
 9 contrast DIST1 - DIST4 COPPER Week 1  0.728  0.569  0.866  1     0     
10 contrast DIST2 - DIST3 COPPER Week 1  0.981  0.777  1.17   0.571 0.429 
11 contrast DIST2 - DIST4 COPPER Week 1  0.825  0.669  0.988  0.971 0.0293
12 contrast DIST3 - DIST4 COPPER Week 1  0.852  0.680  1.03   0.94  0.06  
13 contrast DIST1 - DIST2 COPPER Week 2  0.318  0.135  0.557  1     0     
14 contrast DIST1 - DIST3 COPPER Week 2  0.114  0.0490 0.189  1     0     
15 contrast DIST1 - DIST4 COPPER Week 2  0.0566 0.0243 0.0930 1     0     
16 contrast DIST2 - DIST3 COPPER Week 2  0.364  0.239  0.533  1     0     
17 contrast DIST2 - DIST4 COPPER Week 2  0.183  0.113  0.262  1     0     
18 contrast DIST3 - DIST4 COPPER Week 2  0.501  0.394  0.648  1     0

Bayesian GLMM Part6

1 Preparations

2 Scenario

3 Read in the data

4 Data preparation

5 Exploratory data analysis

6 Fit the model

7 MCMC sampling diagnostics

8 Model validation

9 Partial effects plots

10 Model investigation

11 Further investigations

12 References