rm(list=ls())
library("brms")
library("ggplot2")
library("arm")
library("emmeans")
library("performance")
library("Data4Ecologists")
library("cowplot")
try(dev.off())
Practical 5: Generalized linear models
We learn how to use some classic GLMs, and a distributional model. We focus on prior specifications and appropriate posterior predictive checks. Sometimes we have to apply ggplot tricks for good conditional effects plots.
Poisson regression
We start with a simple comparison of 2 group means (frequentists would use a t-test), but the response are counts (number of slugs found per plot, 2 field types, from Dta4Ecologists package).
Deterministic part: \(\mu=\mu(field)\) –> \(\mu_1\) or \(\mu_2\)
Stochastic part: \(slugs \sim \text{Poisson}(\mu)\)
No log-link is used for this simple model, just “identity link”.
data(slugs)
ggplot(slugs, aes(field, slugs)) +
geom_jitter(alpha=0.5, width=0.1, height=0.05)
It’s always good to check brms default priors in a GLM:
default_prior(slugs ~ field,
family = poisson(link=identity),
data = slugs)
prior class coef group resp dpar nlpar lb ub source
(flat) b default
(flat) b fieldRookery (vectorized)
student_t(3, 1, 2.5) Intercept default
We provide a vague prior for effect (difference in means, dummy-coded) and ensure that the intercept is positive with lower boundary (lb=0
)
= brm(slugs ~ field,
fit.slugs family = poisson(link=identity),
prior =
prior(normal(0,2), class=b) +
prior(student_t(3, 1, 2.5), class=Intercept, lb=0),
data = slugs)
Check convergence
summary(fit.slugs, prior=TRUE)
Family: poisson
Links: mu = identity
Formula: slugs ~ field
Data: slugs (Number of observations: 80)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Priors:
b ~ normal(0, 2)
<lower=0> Intercept ~ student_t(3, 1, 2.5)
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.31 0.18 0.98 1.70 1.00 3355 2765
fieldRookery 0.97 0.29 0.40 1.58 1.00 2634 2123
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(fit.slugs)
Posterior predictions
plot(conditional_effects(fit.slugs),
points=T,
point_args=list(alpha=0.3, width=0.1, height=0.05)
)
In this simple example of comparing two group-means, the summary table already tells us that field=Rookery
has on average +0.97 more slugs (95% CI [0.40,1.58]).
Importantly, we get quite a different effect size if we had just fitted an LM (effect=0.94, but 95% CI covering zero: [-0.05,1.90]). The Poisson model for count data is the correct model here, especially when observations are close or equal to zero!
= brm(slugs ~ field,
fit.slugs.lm prior =
prior(normal(0,2), class=b) +
prior(student_t(3, 1, 2.5), class=Intercept, lb=0),
data = slugs)
fixef(fit.slugs.lm) |> round(3)
Estimate Est.Error Q2.5 Q97.5
Intercept 1.303 0.352 0.612 1.990
fieldRookery 0.935 0.494 -0.054 1.904
Binomial regression
This experiment is about mice infected with parasites of the genus Cryptosporidium. Each replicate has a certain number of mice inoculated by some Dose of parasite oocysts. Then the number of infected mice is counted. (from Quian (2016) Environmental and Ecological Statistics with R)
Fit a dose-response model.
Question: What parasite dose is required to infect 75% of a mouse population?
Deterministic part: \(\text{logit}(\mu)=a+b\cdot Dose\)
Stochastic part: \(Y_{infected} \sim \text{Binomial}(Y_{inoculated},\mu)\)
= read.csv("https://raw.githubusercontent.com/songsqian/eesR/refs/heads/master/R/Data/cryptoDATA.csv",
df sep=" ")
= subset(df, Source=="Finch")
df.sub $Y = as.integer(df.sub$Y)
df.subggplot(df.sub, aes(Dose, Y/N)) + geom_point(alpha=0.5)
Again, we want to scale the predictor Dose
. Here, we directly write scale(Dose)
in the model formula. The advantage is that priors and parameters on z-scale, but plots / conditional_effects are on original scale.
default_prior(Y | trials(N) ~ scale(Dose),
family = binomial(link=logit),
data = df.sub )
prior class coef group resp dpar nlpar lb ub source
(flat) b default
(flat) b scaleDose (vectorized)
student_t(3, 0, 2.5) Intercept default
We add a vaguely informative prior on the (linear scale) slope. The effect is expected to be positive.
= brm(Y | trials(N) ~ scale(Dose),
fit.crypto family = binomial(link=logit),
prior = prior(normal(1,1), class=b),
data = df.sub )
Check convergence
summary(fit.crypto, prior=T)
Family: binomial
Links: mu = logit
Formula: Y | trials(N) ~ scale(Dose)
Data: df.sub (Number of observations: 43)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Priors:
b ~ normal(1, 1)
Intercept ~ student_t(3, 0, 2.5)
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.41 0.14 0.15 0.69 1.00 3019 2729
scaleDose 1.22 0.19 0.86 1.59 1.00 2677 2494
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(fit.crypto)
conditional_effects()
plots predictions for N=1 trials only. points=TRUE
just adds Y values, but we need Y/N values!
plot(conditional_effects(fit.crypto, effects="Dose"), points=TRUE)
Setting all 'trials' variables to 1 by default if not specified otherwise.
Some ggplot-tricks to the rescue!! Note that conditional_effects()
now plots everything on the predictor’s original scale, not on the z-scale.
= plot(conditional_effects(fit.crypto, effects="Dose"),
p1 points=FALSE, plot=FALSE)
1]] + geom_point(data=df.sub,
p1[[aes(Dose, Y/N),
alpha=0.6, size=1.5,
inherit.aes=FALSE)
Posterior predictive checks are not very pretty, but we didn’t measure any other predictors to improve it.
= pp_check(fit.crypto, ndraws=100)
p1 = pp_check(fit.crypto, type="scatter_avg")
p2 plot_grid(p1,p2)
= fitted(fit.crypto)
fitted = residuals(fit.crypto)
residuals binnedplot(fitted, residuals)
What parasite dose is needed to get 75% of mice infected?
= plot(conditional_effects(fit.crypto, effects="Dose"),
p1 points=FALSE, plot=FALSE)
1]] + geom_hline(yintercept=0.75, linetype=2, col="red") p1[[
For Dose=191, the mean regression curve crosses the 75%-infected line, on average 75% are infected. But this means that we are only 50% certain that at least 75% are infected, \(P(\mu>0.75)\approx0.5\).
fitted(fit.crypto,
newdata = data.frame(Dose=191, N=1))
Estimate Est.Error Q2.5 Q97.5
[1,] 0.7513291 0.03655965 0.6768719 0.8215464
.191 = fitted(fit.crypto,
fittednewdata = data.frame(Dose=191, N=1),
summary=FALSE)
hist(fitted.191)
abline(v=0.75, col="red", lwd=2)
Rather, we’d have to supply a dose where the 75%-line crosses the fitted lower quantile (Q2.5), which happens around Dose=229. Then we are 97.5% certain that at least 75% of mice are infected, \(P(\mu>0.75)=0.975\).
fitted(fit.crypto,
newdata=data.frame(Dose=229, N=1),
probs=0.025)
Estimate Est.Error Q2.5
[1,] 0.8269148 0.03625958 0.7504071
.229 = fitted(fit.crypto,
fittednewdata = data.frame(Dose=229, N=1),
summary=FALSE)
hist(fitted.229)
abline(v=0.75, col="red", lwd=2)
Overdispersion
We model the daily growth of young chicken (from datasets package). We use only the early exponential growth phase (looks linear on the logscale). Later, growth slows down and a the nonlinear von Bertalanffy growth function is more appropriate.
data(ChickWeight)
= ggplot(ChickWeight, aes(Time, weight)) +
p1 geom_jitter(alpha=0.5, width=0.1, height=0.05)
= ggplot(ChickWeight, aes(Time, log(weight))) +
p2 geom_jitter(alpha=0.5, width=0.1, height=0.05)
plot_grid(p1,p2)
= subset(ChickWeight, Time<15) ChickWeight
Poisson
The response weight is measured in integers, so we could use Poisson here. Using a log-link, we fit a linear model.
Deterministic part: \(\log(\mu)=a+b\cdot time\)
Stochastic part: \(weight \sim \text{Poisson}(\mu)\)
default_prior(weight ~ Time,
family = poisson(link=log),
data = ChickWeight)
prior class coef group resp dpar nlpar lb ub source
(flat) b default
(flat) b Time (vectorized)
student_t(3, 4.3, 2.5) Intercept default
We add a weak prior for the slope, but growth must be positive (chickens don’t shrink).
.1 = brm(weight ~ Time,
fit.growthfamily = poisson(link=log),
prior = prior(normal(0,1), class=b, lb=0),
data = ChickWeight)
Check convergence
summary(fit.growth.1)
Family: poisson
Links: mu = log
Formula: weight ~ Time
Data: ChickWeight (Number of observations: 393)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 3.75 0.01 3.72 3.77 1.00 1994 1903
Time 0.09 0.00 0.09 0.09 1.00 2180 1779
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(fit.growth.1)
Posterior predictions for fitted values (deterministic part) look OK, but the predicted values don’t correspond to that variance in the data (95% prediction intervals should contain ~95% of the data)
= plot(conditional_effects(fit.growth.1),
p1 points=T,
point_args=list(alpha=0.25),
plot=F)
= plot(conditional_effects(fit.growth.1, method="posterior_predict"),
p2 points=T,
point_args=list(alpha=0.25),
plot=F)
plot_grid(p1[[1]],p2[[1]])
= pp_check(fit.growth.1, ndraws=100)
p1 = pp_check(fit.growth.1, type="scatter_avg")
p2 plot_grid(p1,p2)
Negative Binomial
Deterministic part: \(\log(\mu)=a+b\cdot time\)
Stochastic part: \(weight \sim \text{NegativeBinomial}(\mu,\phi)\)
\(\phi\) is an additional shape parameter modeling overdispersion, meaning the variance of the response does not scale linearly with the response itself (as it does in Poisson).
default_prior(weight ~ Time,
family = negbinomial(link=log),
data = ChickWeight)
prior class coef group resp dpar nlpar lb ub source
(flat) b default
(flat) b Time (vectorized)
student_t(3, 4.3, 2.5) Intercept default
inv_gamma(0.4, 0.3) shape 0 default
Again, we add a weak by strictly positive prior on the slope, brms provides default values also for \(\phi\)
.2 = brm(weight ~ Time,
fit.growthfamily = negbinomial(link=log),
prior = prior(normal(0,1), class=b, lb=0),
data = ChickWeight)
Check convergence
summary(fit.growth.2)
Family: negbinomial
Links: mu = log; shape = identity
Formula: weight ~ Time
Data: ChickWeight (Number of observations: 393)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 3.74 0.02 3.70 3.77 1.00 4080 2769
Time 0.09 0.00 0.09 0.10 1.00 4537 3190
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 36.25 3.46 30.03 43.87 1.00 4312 2861
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(fit.growth.2)
Looking at the same plot as before, the negative binomial model is more appropriate and replicates the variance of the observations much better.
= plot(conditional_effects(fit.growth.2),
p1 points=T,
point_args=list(alpha=0.25),
plot=F)
= plot(conditional_effects(fit.growth.2, method="posterior_predict"),
p2 points=T,
point_args=list(alpha=0.25),
plot=F)
plot_grid(p1[[1]],p2[[1]])
= pp_check(fit.growth.2, ndraws=100)
p1 = pp_check(fit.growth.2, type="scatter_avg")
p2 plot_grid(p1,p2)
Finally, model comparison reveals that the negative binomial model fits the data better.
LOO(fit.growth.1, fit.growth.2)
elpd_diff se_diff
fit.growth.2 0.0 0.0
fit.growth.1 -327.9 66.2
Question: What is the average daily growth rate?
\(\log(\mu)=a+b\cdot time\) –> \(\mu=\exp(a)\cdot \exp(b)^{time}\)
\(\exp(b)\) is average daily growth rate, but we must use the whole posterior distribution! Don’t just use mean \(\bar b\) and compute \(\exp(\bar b)\). Jensen’s inequality says you might get a biased value with nonlinear parameter transformations!
= as_draws_df(fit.growth.2)
post $rate = exp(post$b_Time)
posthist(post$rate)
mean(post$rate)
[1] 1.096641
quantile(post$rate, prob=c(0.05, 0.95))
5% 95%
1.092519 1.100806
The daily growth rate is 1.096, this means an 9.6% increase every day.
Distributional model
We use the same data, but treat the response as a continuous variable. We want to perform a simple linear regression on log(weight) first.
Deterministic part: \(\mu=a+b\cdot time\)
Stochastic part: \(\log(weight) \sim \text{Normal}(\mu,\sigma)\)
.3 = brm(log(weight) ~ Time,
fit.growthprior = prior(normal(0,1), class=b),
data = ChickWeight)
Again, we see that prediction intervals don’t represent the variance of data well.
= plot(conditional_effects(fit.growth.3),
p1 points=T,
point_args=list(alpha=0.25),
plot=F)
= plot(conditional_effects(fit.growth.3, method="posterior_predict"),
p2 points=T,
point_args=list(alpha=0.25),
plot=F)
plot_grid(p1[[1]],p2[[1]])
= pp_check(fit.growth.3, ndraws=100)
p1 = pp_check(fit.growth.3, type="scatter_avg")
p2 plot_grid(p1,p2)
Linear model assumptions are not satisfied
check_model(fit.growth.3, check=c("linearity","homogeneity","qq","normality"))
We introduce a distributional model, where not only \(\mu\), but also \(\sigma\) is a function of the predictor \(time\).
Deterministic part: \(\mu=a+b\cdot time\)
\(\log(\sigma) = c + d \cdot time\) (log-link to keep \(\sigma>0\))
Stochastic part: \(\log(weight) \sim \text{Normal}(\mu,\sigma)\)
Both model parts for \(\mu\) and \(\sigma\) are combined with bf()
(short for brmsformula). For sigma, the log-link is provided automatically
default_prior(bf(log(weight) ~ Time,
~ Time),
sigma data = ChickWeight)
prior class coef group resp dpar nlpar lb ub source
(flat) b default
(flat) b Time (vectorized)
student_t(3, 4.3, 2.5) Intercept default
(flat) b sigma default
(flat) b Time sigma (vectorized)
student_t(3, 0, 2.5) Intercept sigma default
We provide weak priors for both effects (slope for \(\mu\) and \(\sigma\) with time)
.4 = brm(bf(log(weight) ~ Time,
fit.growth~ Time),
sigma prior =
prior(normal(0,1), class=b, coef=Time) +
prior(normal(0,1), class=b, dpar=sigma),
data = ChickWeight)
Check convergence
summary(fit.growth.4, prior=T)
Family: gaussian
Links: mu = identity; sigma = log
Formula: log(weight) ~ Time
sigma ~ Time
Data: ChickWeight (Number of observations: 393)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Priors:
b_Time ~ normal(0, 1)
b_sigma ~ normal(0, 1)
Intercept ~ student_t(3, 4.3, 2.5)
Intercept_sigma ~ student_t(3, 0, 2.5)
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 3.71 0.01 3.70 3.72 1.00 4494 3323
sigma_Intercept -3.06 0.07 -3.19 -2.92 1.00 2997 2371
Time 0.09 0.00 0.09 0.10 1.00 3013 2094
sigma_Time 0.15 0.01 0.14 0.17 1.00 3155 2368
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(fit.growth.4)
Posterior predictions are looking much better now. We don’t use the check_model()
here, because it’s not a linear model anymore.
= plot(conditional_effects(fit.growth.4),
p1 points=T,
point_args=list(alpha=0.25),
plot=F)
= plot(conditional_effects(fit.growth.4, method="posterior_predict"),
p2 points=T,
point_args=list(alpha=0.25),
plot=F)
plot_grid(p1[[1]],p2[[1]])
= pp_check(fit.growth.4, ndraws=100)
p1 = pp_check(fit.growth.4, type="scatter_avg")
p2 plot_grid(p1,p2)
Model comparison reveals that the distributional model is preferred.
LOO(fit.growth.3, fit.growth.4)
elpd_diff se_diff
fit.growth.4 0.0 0.0
fit.growth.3 -131.1 22.3
Attention: Do not use LOOIC / WAIC / AIC / BIC / DIC for comparing models for continuous responses (Normal, Lognormal, Gamma, Beta, …) to models for discrete responses (Binomial, Poisson, Negative Binomial, …)
Continuous distributions are based on probability density functions, while discrete distributions have probability mass functions, which you can’t compare.
Exercise: Logistic regression
From Qian, S. (2016) Environmental and Ecological Statistics with R
This field experiment is about seed predation by rodents (response \(Predation=\{0,1\}\)). Seeds were placed in gauze bags in 4 topographic locations (predictor \(topo\)). Sites were visited in 6 sampling campaigns (predictor \(time\)). Also control for continuous predictor \(seed.weight\).
Deterministic part: \(\text{logit}(\mu)=\dots\)
Stochastic part: \(Predation \sim \text{Bernoulli}(\mu)\)
Start with
Predation~log(seed.weight)
Test if there is a difference between locations (
+topo
)Also use time as factorial predictor (
+time
)
Question: Does predation rate reach a stable value in the last three time periods?
Hint: Use emmeans
for predicting average predation rates for all the different levels of time
and also their contrasts.
= read.csv("https://raw.githubusercontent.com/songsqian/eesR/refs/heads/master/R/Data/seedbank.csv",
df sep=",")
= subset(df, Predation %in% 0:1)
df $Predation = as.integer(df$Predation)
df$time = as.factor(df$time)
df$topo = as.factor(df$topo)
df$seed = scale(log(df$seed.weight))
dfhead(df)
species seed.weight time topo ground Predation seed
1 8 4250.0 3 1 2 0 2.10606568
2 5 28.5 1 1 2 0 0.01142359
3 5 28.5 3 1 2 0 0.01142359
4 2 4.5 1 1 2 0 -0.76110864
5 2 4.5 2 1 2 0 -0.76110864
6 2 4.5 3 1 2 0 -0.76110864
ggplot(df, aes(log(seed.weight), Predation)) +
geom_jitter(height=0.05, width=0.05, alpha=0.2)
Model 1
We start with a simple logistic regression, just with the continuous predictor seed
. Note that priors for intercept and slope are on linear scale (\(\eta\) in the lecture), not on response scale (\(\mu\) in the lecture).
.1 = brm(Predation ~ seed,
df.seedfamily = bernoulli(link=logit),
prior = prior(normal(1,1), class=b),
data = df )
Check convergence
summary(df.seed.1, prior=T)
Family: bernoulli
Links: mu = logit
Formula: Predation ~ seed
Data: df (Number of observations: 1142)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Priors:
b ~ normal(1, 1)
Intercept ~ student_t(3, 0, 2.5)
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -2.14 0.11 -2.35 -1.93 1.00 1743 2201
seed 1.02 0.09 0.86 1.19 1.00 1876 2271
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(df.seed.1)
Posterior predictive checks
plot(conditional_effects(df.seed.1),
points = T,
point_args = list(alpha=0.1, width=0.02, height=0.05)
)
We use the binned residuals plot, which is appropriate for 0/1 responses. It doesn’t look great, but we will add more predictors
= fitted(df.seed.1)
fitted = residuals(df.seed.1)
residuals binnedplot(fitted, residuals)
Model 2
Now we add the categorical predictor topo
to account for differences in location.
default_prior(Predation ~ seed + topo,
family = bernoulli(link=logit),
data = df )
prior class coef group resp dpar nlpar lb ub source
(flat) b default
(flat) b seed (vectorized)
(flat) b topo2 (vectorized)
(flat) b topo3 (vectorized)
(flat) b topo4 (vectorized)
student_t(3, 0, 2.5) Intercept default
We assign priors for seed slope (continuous) and topo-differences (categorical)
.2 = brm(Predation ~ seed + topo,
df.seedfamily = bernoulli(link=logit),
prior =
prior(normal(0,2), class=b) + # for all effects
prior(normal(1,1), class=b, coef=seed), # just for seed
data = df )
Check convergence
summary(df.seed.2, prior=T)
Family: bernoulli
Links: mu = logit
Formula: Predation ~ seed + topo
Data: df (Number of observations: 1142)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Priors:
b ~ normal(0, 2)
b_seed ~ normal(1, 1)
Intercept ~ student_t(3, 0, 2.5)
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -1.02 0.15 -1.32 -0.74 1.00 4301 3184
seed 1.15 0.10 0.97 1.35 1.00 2927 2778
topo2 -1.94 0.28 -2.52 -1.39 1.00 2927 2120
topo3 -1.57 0.26 -2.09 -1.07 1.00 3231 3086
topo4 -2.00 0.29 -2.58 -1.45 1.00 3392 2937
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(df.seed.2)
We see there is substantial variation between locations (topo
)
= plot(conditional_effects(df.seed.2,
p1 effect="seed:topo",
prob=0),
plot=F)
= plot(conditional_effects(df.seed.2,
p2 effect="topo:seed"),
plot=F)
plot_grid(p1[[1]],p2[[1]])
This ~seed+topo
(generalized) ANCOVA without interaction produces parallel fitted values (intercept varies with topo
) on the linear scale (method="posterior_linpred"
).
= plot(conditional_effects(df.seed.2,
p1 effect="seed:topo",
prob=0,
method="posterior_linpred"),
plot=F)
= plot(conditional_effects(df.seed.2,
p2 effect="topo:seed",
method="posterior_linpred"),
plot=F)
plot_grid(p1[[1]],p2[[1]])
= fitted(df.seed.2)
fitted = residuals(df.seed.2)
residuals binnedplot(fitted, residuals)
Residuals still not looking great, but model comparison suggests that including topo
improves the model.
LOO(df.seed.1, df.seed.2)
elpd_diff se_diff
df.seed.2 0.0 0.0
df.seed.1 -40.3 9.3
Model 3
Now we add sampling campaign (time
) as a factorial predictor.
.3 = brm(Predation ~ seed + topo + time,
df.seedfamily = bernoulli(link=logit),
prior =
prior(normal(0,2), class=b) + # for all effects
prior(normal(1,1), class=b, coef=seed), # just for seed
data = df )
Check convergence
summary(df.seed.3, prior=T)
Family: bernoulli
Links: mu = logit
Formula: Predation ~ seed + topo + time
Data: df (Number of observations: 1142)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Priors:
b ~ normal(0, 2)
b_seed ~ normal(1, 1)
Intercept ~ student_t(3, 0, 2.5)
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -3.19 0.43 -4.07 -2.39 1.00 1418 1671
seed 1.25 0.10 1.05 1.45 1.00 2017 2646
topo2 -2.11 0.29 -2.69 -1.56 1.00 2774 2972
topo3 -1.70 0.26 -2.23 -1.20 1.00 3010 3333
topo4 -2.17 0.29 -2.75 -1.61 1.00 2671 2714
time2 1.85 0.49 0.92 2.84 1.00 1642 1785
time3 2.22 0.48 1.32 3.19 1.00 1594 1777
time4 2.69 0.48 1.78 3.66 1.00 1497 1952
time5 2.54 0.48 1.65 3.49 1.00 1550 1941
time6 2.74 0.48 1.85 3.70 1.00 1541 2034
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(df.seed.3)
With 3 predictors, plotting conditional effects is getting more complicated, but the function can handle it.
The same plot as in Model 2, but when we don’t specify the 3rd predictor, fitted values are shown at its reference level (time=1
)
= plot(conditional_effects(df.seed.3,
p1 effect="seed:topo", # time at reference level
prob=0),
plot=F)
= plot(conditional_effects(df.seed.3,
p2 effect="topo:seed"), # time at reference level
plot=F)
plot_grid(p1[[1]],p2[[1]])
We’re actually interested in prediction for different time
-levels. Plotting time:seed
and time:topo
, the omitted 3rd predictor is held at its reference level.
= plot(conditional_effects(df.seed.3,
p1 effect="time:seed"), # topo at reference level
plot=F)
= plot(conditional_effects(df.seed.3,
p2 effect="time:topo"), # seed at mean=0
plot=F)
plot_grid(p1[[1]],p2[[1]])
It already looks like predation rate is pretty constant in the last 3 campaigns.
By specifying conditions=
, we can redo these plots for all levels of the 3rd predictor.
plot(conditional_effects(df.seed.3,
effect="time:seed",
conditions = make_conditions(df.seed.3,vars=c("topo"))))
plot(conditional_effects(df.seed.3,
effect="time:topo",
conditions = make_conditions(df.seed.3,vars=c("seed"))))
= fitted(df.seed.3)
fitted = residuals(df.seed.3)
residuals binnedplot(fitted, residuals)
Model comparison reveals that predation rate changes over time.
LOO(df.seed.1, df.seed.2, df.seed.3)
elpd_diff se_diff
df.seed.3 0.0 0.0
df.seed.2 -28.4 5.3
df.seed.1 -68.7 10.8
But it still doesn’t answer the question if predation rate reached a stable limit in the last 3 campaigns.
With emmeans, we can extract mean fitted values versus time
, averaged over the remaing predictors. By default, the output is on linear scale, but it can also be displayed on response scale
emmeans(df.seed.3, ~time)
time emmean lower.HPD upper.HPD
1 -4.66 -5.57 -3.86
2 -2.83 -3.37 -2.30
3 -2.46 -2.94 -1.95
4 -1.99 -2.43 -1.53
5 -2.13 -2.60 -1.70
6 -1.94 -2.38 -1.45
Results are averaged over the levels of: topo
Point estimate displayed: median
Results are given on the logit (not the response) scale.
HPD interval probability: 0.95
emmeans(df.seed.3, ~time, type="response")
time response lower.HPD upper.HPD
1 0.00934 0.00291 0.0188
2 0.05570 0.03103 0.0873
3 0.07878 0.04880 0.1218
4 0.12035 0.07660 0.1713
5 0.10606 0.06341 0.1488
6 0.12567 0.07929 0.1803
Results are averaged over the levels of: topo
Point estimate displayed: median
Results are back-transformed from the logit scale
HPD interval probability: 0.95
We could look at pairwise difference between all levels of time
(all campaigns), but emmeans also can display only consecutive contrasts with consec~...
emmeans(df.seed.3, consec~time)
$emmeans
time emmean lower.HPD upper.HPD
1 -4.66 -5.57 -3.86
2 -2.83 -3.37 -2.30
3 -2.46 -2.94 -1.95
4 -1.99 -2.43 -1.53
5 -2.13 -2.60 -1.70
6 -1.94 -2.38 -1.45
Results are averaged over the levels of: topo
Point estimate displayed: median
Results are given on the logit (not the response) scale.
HPD interval probability: 0.95
$contrasts
contrast estimate lower.HPD upper.HPD
time2 - time1 1.840 0.902 2.812
time3 - time2 0.378 -0.287 1.045
time4 - time3 0.469 -0.132 1.147
time5 - time4 -0.150 -0.770 0.451
time6 - time5 0.190 -0.424 0.817
Results are averaged over the levels of: topo
Point estimate displayed: median
Results are given on the log odds ratio (not the response) scale.
HPD interval probability: 0.95
The contrasts between the last 3 campaigns are not different from 0 (95% CIs widely cover 0), so we can answer the research question with yes.
Alternatively, you could fit a model Predation ~ seed + topo + time.combined
, where time.combined
is the same as time
, but levels 4,5,6 are combined in 1 level. Then do a model comparison versus the full model.