9  Multivariate models

These lecture notes cover topics from:

The reading here is short for two reasons, the second more important than the first:

Related readings:

Topics:

9.1 Preliminaries

Load libraries we will need:

library(brms)
library(lme4)
library(arm)
library(tidyverse)

library(tidybayes)

library(bayestestR)

library(phonTools) # TODO

library(patchwork)

# avoids bug where select from MASS is used
select <- dplyr::select
  1. If you have loaded rethinking, you need to detach it before using brms. See Kurz (2023) Sec. 4.3.1.

  2. I use the file argument when fitting brms models to make compiling this document easier (so the models don’t refit every time I compile). You may or may not want to do this for your own models. See file and file_refit arguments in ?brm.

  3. Here I set the file_refit option so “brms will refit the model if model, data or algorithm as passed to Stan differ from what is stored in the file.”

options(brms.file_refit = "on_change")
  1. I use chains = 4, cores = 4 when fitting brm models below—this means 4 chains, each to be run on one core on my laptop. cores = 4 may need to be adjusted for your computer. (You may have fewer or more cores; I have 8 cores, so this leaves 50% free.) You should figure out how to use multiple cores on your machine.

  2. Make numbers be printed only to 3 digits, for neater output:

options(digits = 3)

9.2 Data

9.2.1 American English vowels

Install the joeysvowels package (Stanley 2020), if you haven’t done so already:

remotes::install_github("joeystanley/joeysvowels")

The joeysvowels package provides a handful of datasets, some subsets of others, that contain formant measurements and other information about the vowels in my own speech. The purpose of the package is to make vowel data easily accessible for demonstrating code snippets when demonstrating working with sociophonetic data.

library(joeysvowels)

midpoints <- mutate(midpoints, dur = end - start)

We’ll use the midpoints dataset: these are measures of F1 and F2 at vowel midpoint for one speaker’s vowels from many words in a controlled context (surrounding coronal consonants): “odd”, “dad”, “sod”, “Todd”, and so on. Standard F1/F2 vowel plot of all data, with 95% ellipses:1

Code
midpoints %>% ggplot(aes(x = F2, y = F1)) +
  geom_point(aes(color = vowel), size = 0.2) +
  scale_x_reverse() +
  scale_y_reverse() +
  stat_ellipse(level = 0.95, aes(color = vowel))

Note how vowel distributions differ both in location (in F1/F2 space) and shape: the direction and size of the ellipse.

Let’s further restrict to just the THOUGHT and LOT vowels for a simple example:

twovowels <- filter(midpoints, vowel %in% c("LOT", "THOUGHT")) %>% droplevels()

RQs for this data could be:

  1. Are LOT and THOUGHT pronounced differently?
  2. If so, how?

These two vowels are merged for many North American English speakers, but (by self-report) not for this speaker.

Their data for just these vowels looks like:

Code
twovowels %>% ggplot(aes(x = F2, y = F1)) +
  geom_point(aes(color = vowel), size = 0.2) +
  scale_x_reverse() +
  scale_y_reverse() +
  stat_ellipse(level = 0.95, aes(color = vowel))

9.2.2 Canadian English voice quality

This is data from a project by Jeanne Brown, a PhD student at McGill (thanks, Jeanne)!

vq_data <- readRDS("jb_creak_data.rds")

This is a greatly simplified subset of the data from from Jeanne’s submitted (paper)[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4960807] on acoustic measures of “creaky voice” in Canadian English-French bilingual speakers.

This subset is just:

  • A couple acoustic measures
  • English speech only (the dataset also contains French)
  • Only one utterance position (66-99% through utterance)

There are 4884 observations.

Every row corresponds to two acoustic measures that correlate with creaky voice, measured for a single vowel, in a corpus of podcast speech:

  • CPP: cepstral peak prominence
    • Continuous, where a lower value is expected to correlate with creakiness.
  • bad_f0_track: whether f0 could not be measured for this vowel
    • Binary (0/1), where 1 is expected to correlate with creakiness.

Other columns:

  • Sex, Sex_c: factor and (centered) numeric versions of speaker gender (higher = male)
  • YOB, YOB_c: raw and normalized (mean/2 SD) year of birth of speaker.
  • dev_rate: a measure of speaking rate, relative to the speaker’s mean.
  • prev_seg, foll_seg: what kind of segment precede and follow the vowel:
    • vowel, or various consonants (voiceless stop, etc.), or none (= word boundary)

The actual research questions are:

  1. Does speaker Sex affect creakiness?
  2. Does speaker YOB affect creakiness?

Jeanne’s paper examines these questions on one acoustic measure at a time. We’ll consider a couple additional questions, made possible by jointing modeling the two acoustic measures.

  1. Do speakers with higher CPP have a higher probability of bad_f0_track?
  2. Do contexts ?

Either one would give insight into whether there is one underlying aspect of voice quality being captured by both measures.

Empirical plots:

p1 <- vq_data %>% group_by(Speaker) %>% mutate(CPP = mean(CPP)) %>% 
  ggplot(aes(x = Sex, y = CPP)) + geom_boxplot() + labs(y = "Speaker average CPP")

p2 <- vq_data %>% group_by(Speaker) %>% mutate(CPP = mean(CPP)) %>% 
  ggplot(aes(x = YOB, y = CPP)) + geom_smooth(method = 'lm') + geom_point() + labs(y = "Speaker average CPP")

p1 + p2
## `geom_smooth()` using formula = 'y ~ x'


p3 <- vq_data %>% group_by(Speaker) %>% mutate(bt = mean(bad_f0_track)) %>% 
  ggplot(aes(x = Sex, y = bt)) + geom_boxplot() + labs(y = "Speaker % bad f0 tracks")

p4 <- vq_data %>% group_by(Speaker) %>% mutate(bt = mean(bad_f0_track)) %>% 
  ggplot(aes(x = YOB, y = bt)) + geom_smooth(method = 'lm') + geom_point() + labs(y = "Speaker % bad f0 tracks")

p3 + p4
## `geom_smooth()` using formula = 'y ~ x'

TODO: make similar plots by context

9.4 Assessing relatedness of \(y_1\) and \(y_2\)

We fit a multivariate model of bad_f0_track and CPP for the vq_data dataset, accounting for:

  • Both \(y_1\) and \(y_2\) can depend on Sex, YOB_c (RQs 1, 2)

  • Speakers can vary in \(y_1\) and \(y_2\)

  • The way speakers vary may be correlated between \(y_1\) and \(y_2\) (RQ 3)

  • Contexts can vary in \(y_1\) and \(y_2\), i.e., prev_seg and foll_seg

  • The way contexts vary may be correlated between \(y_1\) and \(y_2\) (RQ 4)

Model formula:

## just a basic prior for random-effect correlation matrices
prior_1 <- c(
  prior(lkj(1.5), class=cor) 
)


cpp_model <- brmsformula(
  CPP ~  Sex_c + YOB_c+dev_rate  +
    (1 | p | Speaker) +
    (1 | q | prev_seg) +
    (1 | r | foll_seg),
  family = gaussian()
)

bad_f0_track_model <-  brmsformula(bad_f0_track ~  Sex_c + YOB_c+dev_rate  +
  (1 | p | Speaker) +
  (1 | q | prev_seg) +
  (1 | r | foll_seg), family = bernoulli()
) 

joint_model <- cpp_model + bad_f0_track_model + set_rescor(FALSE)

The full model is:

joint_model
## CPP ~ Sex_c + YOB_c + dev_rate + (1 | p | Speaker) + (1 | q | prev_seg) + (1 | r | foll_seg) 
## bad_f0_track ~ Sex_c + YOB_c + dev_rate + (1 | p | Speaker) + (1 | q | prev_seg) + (1 | r | foll_seg)

In this formula, note the | p | |q|, etc. in the random effects. This allows for correlations across random effects for different response variables. For example, (1 | p | Speaker): allows for correlation between by-speaker random intercepts for cpp and bad_track_f0. Its interpretation is, “how correlated are a speaker’s values for these two acoustic cues, after accounting for their Sex and YOB?”

There are four such terms “tying” the two models (for CPP and bad_f0_track) together.

Fit the model:

cpp_badtrack_joint_m1 <- brm(formula = formula_4, 
                             data = PS_sub2 ,
                             chains = 4, cores = 4, iter = 2000,
                             prior = prior_1, 
                             save_pars = save_pars(all = TRUE),
                             file = 'models/cpp_badtrack_joint_m1.brm')

Model summary:

cpp_badtrack_joint_m1
##  Family: MV(gaussian, bernoulli) 
##   Links: mu = identity; sigma = identity
##          mu = logit 
## Formula: CPP ~ Sex_c + YOB_c + dev_rate + (1 | p | Speaker) + (1 | q | prev_seg) + (1 | r | foll_seg) 
##          bad_f0_track ~ Sex_c + YOB_c + dev_rate + (1 | p | Speaker) + (1 | q | prev_seg) + (1 | r | foll_seg) 
##    Data: PS_sub2 (Number of observations: 4884) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Multilevel Hyperparameters:
## ~foll_seg (Number of levels: 9) 
##                                         Estimate Est.Error l-95% CI u-95% CI
## sd(CPP_Intercept)                           0.77      0.25     0.43     1.36
## sd(badf0track_Intercept)                    0.63      0.21     0.35     1.15
## cor(CPP_Intercept,badf0track_Intercept)    -0.74      0.22    -0.97    -0.17
##                                         Rhat Bulk_ESS Tail_ESS
## sd(CPP_Intercept)                       1.00     1118     1968
## sd(badf0track_Intercept)                1.00     1338     1923
## cor(CPP_Intercept,badf0track_Intercept) 1.00     1313     1681
## 
## ~prev_seg (Number of levels: 9) 
##                                         Estimate Est.Error l-95% CI u-95% CI
## sd(CPP_Intercept)                           1.07      0.28     0.66     1.76
## sd(badf0track_Intercept)                    1.22      0.32     0.75     2.00
## cor(CPP_Intercept,badf0track_Intercept)    -0.94      0.08    -1.00    -0.69
##                                         Rhat Bulk_ESS Tail_ESS
## sd(CPP_Intercept)                       1.00     1173     1743
## sd(badf0track_Intercept)                1.00     1241     1583
## cor(CPP_Intercept,badf0track_Intercept) 1.00     1347     2259
## 
## ~Speaker (Number of levels: 49) 
##                                         Estimate Est.Error l-95% CI u-95% CI
## sd(CPP_Intercept)                           1.75      0.20     1.40     2.17
## sd(badf0track_Intercept)                    0.71      0.09     0.56     0.91
## cor(CPP_Intercept,badf0track_Intercept)    -0.07      0.15    -0.36     0.23
##                                         Rhat Bulk_ESS Tail_ESS
## sd(CPP_Intercept)                       1.01      622      987
## sd(badf0track_Intercept)                1.00     1206     2052
## cor(CPP_Intercept,badf0track_Intercept) 1.00     1063     1706
## 
## Regression Coefficients:
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## CPP_Intercept           20.51      0.52    19.51    21.54 1.01      692
## badf0track_Intercept    -0.36      0.49    -1.30     0.60 1.00      708
## CPP_Sex_c               -0.99      0.51    -1.97     0.02 1.00      668
## CPP_YOB_c                1.15      0.53     0.08     2.20 1.01      631
## CPP_dev_rate            -0.15      0.04    -0.23    -0.08 1.00     4965
## badf0track_Sex_c         1.14      0.22     0.70     1.57 1.01     1040
## badf0track_YOB_c        -0.32      0.22    -0.77     0.09 1.00     1232
## badf0track_dev_rate     -0.08      0.03    -0.15    -0.03 1.00     5514
##                      Tail_ESS
## CPP_Intercept            1085
## badf0track_Intercept     1237
## CPP_Sex_c                1283
## CPP_YOB_c                1137
## CPP_dev_rate             2877
## badf0track_Sex_c         1511
## badf0track_YOB_c         1686
## badf0track_dev_rate      3238
## 
## Further Distributional Parameters:
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_CPP     2.80      0.03     2.74     2.86 1.00     5244     2597
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

A new type of term here are the random-effect correlations corresponding to different models.

cor(CPP_Intercept,badf0track_Intercept) is the correlation between by-speaker random intercepts for cpp and bad_track_f0.

Exercise 9.3  

  1. Which terms in the model address RQs 1 and 2? What answers does it suggest?

  2. Which terms in the model address RQs 3 and 4? What answers do they suggest?

  3. Consider the effect of foll_seg. Which following contexts give the most and least creakiness (= higher bad_f0_track, lower CPP)? Can you think of why these make (phonetic) sense?


  1. If you’re not familiar with these kinds of plot: \(x\) and \(y\) axes go from larger to smaller.↩︎

  2. Note that these data actually come from different words (column word), so our model should include random effects. I haven’t included random effects just to get the models to fit faster, for pedagogical purposes, and because there are very few observations here anyway (127) relative to model complexity↩︎