Revisiting depression incidence by county and vote for Trump

Just before Christmas I blogged about the positive correlation between depression incidence in US counties and their vote for Trump in the 2024 presidential election. In addition to my casual interest in the topic, I used it as a case study in multilevel modelling while adjusting for spatial correlation. I explicitly said that I didn’t think it likely that the depression-vote relationship was a causal link; I suspected that most likely, some underlying variable that caused depression was also related to voting behaviour.

I’m coming back to the issue because on reflection, I have three bits of unfinished business:

I had a nagging thought that with 50+ counties per state (and hence some degrees of freedom to spare), I perhaps should have allowed random slopes for depression incidence in each state, rather than just random intercepts
I thought my spatial statistics workaround, of just adding a rubbery mat to space to suck up the spatial autocorrelation between observations, perhaps was a bit slap-dash and I should be as a matter of course modelling the spatial autocorrelation explicitly
An alert reader, Jonathan Spring, pointed out that in the USA there are very marked racial differences between depression diagnoses and suggested that perhaps in my model the depression incidence was standing in as a proxy for “whiteness”.

Of these, the first two felt like bits of probably-immaterial-to-the-question methodological details that I should polish up, whereas number 3 seemed likely to be the explanation of the whole phenomenon. In other words, race is likely a confounder of the depression-voting relationship, as per this directed acyclic graph:

Which means that if we want to actually understand the causal relationship of depression on voting we would need to control for race in the regression. Now, there are other things we’d need to do too; in particular to identify and control for the various “other factors”. I’m not up for that right now - this is someone else’s job - but I’m interested enough to go part-way into things to at least check out the degree to which race makes the depression relationship go away.

This is a long post and parts of it are likely to be of interest only to people concerned with the minutiae of multilevel modelling with spatial autocorrelation. If you just want to see what including race in the model does to the depression effect on voting for Trump, you can skip down to the final section.

All the code for this post assumes that the code from the previous post has already been run. If you want a version of the whole script that just runs, it’s here in the source repository of the whole blog.

But here’s the code for just that DAG diagram:

library(GGally)
library(ggdag)
library(patchwork)
library(kableExtra)

# you need to have run the code from the previous blog first, line below will 
# work only for those who are running this from a clone of my whole repository:
if(!exists("combined2")){
  source("0286-voting-and-depression.R")
}

#-----------diagram--------------
dag <- dagify(
  Vote ~ Race + Depression + 'Other factors',
  Depression ~ Race + 'Other factors'
)

set.seed(125)
ggdag(dag, edge_type = "link", node = FALSE) +
  theme_dag_blank()  +
  geom_dag_node(colour = "lightgreen", shape = c(19, 19, 19, 17)) +
  geom_dag_edges(edge_colour = rep(c("lightblue", "lightblue", "grey", "black", "black"), each = 100),
                 edge_width = rep(c(1.5, 3, 0.8, 1.5, 3), each = 100),
                 edge_linetype = rep(c(1,1, 3, 1, 1), each = 100)) +
  geom_dag_text(colour = "steelblue")

OK, on to fixing up the loose ends with my modelling approach.

Move from `gam` to `gamm`

I decided that the simplest material improvement to my approach to the spatial autocorrelation would be to explicitly model the spatial co-variance of the residuals. One way of doing this that is the minimal change from my approach so far would be to move the model-fitting from mgcv::gam() to mgcv::gamm(). gamm() fits models by iterating between calls to nlme::lme() and a generalized additive model until convergence. Basically this gives us the ability to use the correlation structures for residuals and mixed random and fixed effects of lme() while still using the splines and response distributions of gam(). This is what I need as I want to continue modelling my response with a quasi-binomial family, and I am probably going to want to keep my “rubber sheet” nuisance effect over the US space modelled with a two-dimensional spline (s(x, y)), even while using the correlation features of lme().

So the first thing I do is create a model6b, as similar as possible to the model6 that was the best and final model from the last blog post, but just changes the estimation method. So here it is, a straight transfer from gam() to gamm()

#---------------moving from gam to gamm--------------
# start with the same model as our final one in the last post, but estimated differently:
model6b <- gamm(per_gop ~ cpe + s(x, y) + s(state_name, bs = "re"), 
                weights = total_votes, family = quasibinomial, data = combined2)

# we can't compare the AIC of models created with gam and gamm, see
# https://stats.stackexchange.com/questions/70512/huge-%CE%94aic-between-gam-and-gamm-models

# some differences eg effective degrees of freedom less in the GAMM. But the
# main conclusions (significance of cpe) the same
summary(model6)
summary(model6b$gam)

… which gives us these two different results:

> summary(model6)

Family: quasibinomial 
Link function: logit 

Formula:
per_gop ~ cpe + s(x, y) + s(state_name, bs = "re")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -2.7972    20.7488  -0.135    0.893    
cpe          15.5908     0.6778  23.001   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                edf Ref.df     F p-value    
s(x,y)        28.39  28.94 7.995  <2e-16 ***
s(state_name) 49.00  49.00 7.949  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.386   Deviance explained = 44.8%
GCV = 3138.5  Scale est. = 3170.7    n = 3107
> summary(model6b$gam)

Family: quasibinomial 
Link function: logit 

Formula:
per_gop ~ cpe + s(x, y) + s(state_name, bs = "re")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -2.8878     0.1481  -19.50   <2e-16 ***
cpe          15.0166     0.6271   23.95   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                edf Ref.df     F p-value    
s(x,y)        25.41  25.41 6.438  <2e-16 ***
s(state_name) 39.91  49.00 7.154  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.382   
  Scale est. = 3005.3    n = 3107

There’s some differences coming from the different estimation methods. The gamm() model uses less effective degrees of freedom for both the s(x,y) rubber mat and the s(state_name, bs="re") state level random intercept. The fixed effect coefficient for cpe (‘crude prevalence estimate’ of county-level depression) is a little different - 15.0 versus 15.6. But we can see we’re fitting the same model and getting substantively the same results.

Next small change I make is to move the state level random intercept from the gam specification into lme. Again, this is an identical model, just changing how the fitting is done:

# move the state random effect into the things to be estimated by nlme:
model6c <- gamm(per_gop ~ cpe + s(x, y),
                random = list(state_name = ~1),
                weights = total_votes, family = quasibinomial, data = combined2)

# fixed coefficients are identical to 6b, but fit was much faster
summary(model6c$gam) # not shown

This is a big performance speed up (which we’re going to need as the models start getting more complex) for materially the same results.

Random slopes

Now I’m ready to take advantage of the move to the gamm syntax and I let the slopes, not just the intercepts, vary for each state:

#----------------random slopes--------------------
model7 <- gamm(per_gop ~ cpe + s(x, y),
                random = list(state_name = ~1 + cpe),
                weights = total_votes, family = quasibinomial, data = combined2)

# AIC is 200 less so worth having the random intercepts
AIC(model7$lme, model6c$lme)
summary(model7$lme)

…which gives us these results. Note that the Akaike Information Criterion has decreased by 200 from adding the random slopes, suggesting the model is overall worth the increased complexity. The cpe (depression) coefficient has decreased in size but is still quite large and definitely significant.

> AIC(model7$lme, model6c$lme)
            df      AIC
model7$lme   9 8605.692
model6c$lme  7 8806.902
> summary(model7$lme)
Linear mixed-effects model fit by maximum likelihood
  Data: data 
       AIC      BIC    logLik
  8605.692 8660.064 -4293.846

Random effects:
 Formula: ~Xr - 1 | g
 Structure: pdIdnot
             Xr1      Xr2      Xr3      Xr4      Xr5      Xr6      Xr7
StdDev: 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986
             Xr8      Xr9     Xr10     Xr11     Xr12     Xr13     Xr14
StdDev: 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986
            Xr15     Xr16     Xr17     Xr18     Xr19     Xr20     Xr21
StdDev: 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986
            Xr22     Xr23     Xr24     Xr25     Xr26     Xr27
StdDev: 4.440986 4.440986 4.440986 4.440986 4.440986 4.440986

 Formula: ~1 + cpe | state_name %in% g
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept)  3.294624 (Intr)
cpe         15.833056 -0.985
Residual    50.963709       

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects:  list(fixed) 
                 Value Std.Error   DF   t-value p-value
X(Intercept) -2.099504 0.5204783 3054 -4.033796  0.0001
Xcpe         10.482779 2.5041023 3054  4.186243  0.0000
Xs(x,y)Fx1    0.221830 0.1339667 3054  1.655859  0.0979
Xs(x,y)Fx2    0.134028 0.1735957 3054  0.772070  0.4401
 Correlation: 
           X(Int) Xcpe   X(,)F1
Xcpe       -0.985              
Xs(x,y)Fx1 -0.083  0.084       
Xs(x,y)Fx2 -0.121  0.126  0.568

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-9.65917765  0.05092882  0.32541129  0.62277480  6.27537901 

Number of Observations: 3107
Number of Groups: 
                g state_name %in% g 
                1                50 

We can use this model to make a new version of the final chart from the previous blog post:

preds7 <- predict(model7$gam, se.fit = TRUE, type = "response")

combined2 |>
  mutate(fit = preds7$fit,
         se = preds7$se.fit,
         lower = fit - 1.96 * se,
         upper = fit + 1.96 * se) |>
  ggplot(aes(x = cpe, group = state_name)) +
  geom_point(aes(y = per_gop, colour = state_name), alpha = 0.5) +
  geom_ribbon(aes(ymin = lower, ymax = upper), fill = "black", alpha = 0.5) +
  theme(legend.position = "none") +
  scale_y_continuous(limits = c(0, 1), label = percent) +
  scale_x_continuous(label  = percent)  +
  labs(x = "Crude prevalence estimate of depression",
       y = "Percentage vote for Trump in 2024 election",
       subtitle = "Grey ribbons are 95% confidence intervals from quasibinomial generalized additive model with spatial effect and state-level random intercept effect",
       title = "Counties with more depression voted more for Trump",
       caption = the_caption) +
  facet_wrap(~state_name)

It doesn’t look much different. So my hunch on this aspect was right; there’s enough data to justify letting the depression-vote slope vary in each state (ie we get a better model) but it doesn’t change the substantive conclusion.

Better spatial autocorrelation

Now I’m ready to improve the way I’m handling spatial autocorrelation. As I mentioned last time, my approach to this is to model the centroid of each county with an s(x, y) two-dimensional spline, a sort of rubber mat overlaid over the USA to soak up the things counties have in common with their neighbouring counties. I still think this is much better than nothing, but admit there is still a problem that even after doing this the residuals of counties that are close together will still be correlated. Which means that each observation is not worth as much as if it were truly independent, which means that my inferences are over-confident in their precision.

First I wanted to check out if this hunch was correct. The standard way to look for spatial autocorrelation is via a variogram. I couldn’t get the Variogram() function in nlme to return sensible results from the models that had been fit with gamm() so I had to make one more explicitly with the variogram() function in gstat:

library(sp)
library(gstat)

sp_data <- combined2 |>
  mutate(res = residuals(model7$lme))
coordinates(sp_data) <- ~x+y

plot(variogram(res ~ 1, data = sp_data), 
       main = "Variogram for pairs of counties' residuals",
       xlab = "Distance between pairs of counties")

… which gets me this:

A variogram works by calculating the distances between each pair of observations, binning these and then assessing how related the pairs of observations (in this case, residuals after the model fitting) are in each bin. It doesn’t do this by a correlation coefficient but by another measure that I haven’t got my head around but is described as ‘half the variance of the differences between all possible points spaced a constant distance apart.’. The important thing is that a score of zero means perfect correlation, and the higher the numbers are the more independent the pairs of observations contained in the bin are. So to interpret the chart above we can say that the counties that are about 20 degrees (of latitude/longitude, ignoring curvature of the earth) apart or closer have some degree of correlation with eachother; once they get to that far apart this measure more or less stablises.

I don’t know why spatial statistics doesn’t just use a good-ole correlation coefficient for this job, but presume there are interesting historical reasons. To check that I wasn’t mangling things, I decided to “roll my own” spatial correlation measure, with:

# let's roll our own on a similar concept to see what's happening
counties <- select(combined2, county_fips, x, y)

# find the counties' distance from each other country
county_pairs <- expand_grid(from = counties$county_fips,
                            to = counties$county_fips) |>
  filter(from > to) |>
  left_join(counties, by =c("from" = "county_fips")) |>
  rename(fx = x, fy = y) |>
  left_join(counties, by =c("to" = "county_fips")) |>
  rename(tx = x, ty = y) |>
  mutate(distance = sqrt((fx - tx) ^ 2 + (fy - ty) ^ 2))  

res7 <- combined2 |>
  mutate(res = residuals(model7$lme, type = "response")) |>
  select(county_fips, res) 

county_pairs |>
  left_join(rename(res7, from_res = res), by = c("from" = "county_fips")) |>
  left_join(rename(res7, to_res = res), by = c("to" = "county_fips")) |>
  mutate(distance = cut(distance, breaks = c(0, 0.5, 1,2,3, 4,6, 8,12,24,48,108)))  |>
  group_by(distance)  |>
  summarise(correlation = cor(from_res, to_res),
            n = n()) |>
  ungroup() |> 
  ggplot(aes(x = distance, y = correlation, size = n)) +
  geom_point(colour = "red") +
  scale_size_area(label = comma) +
  labs(title = "Correlation between pairs of counties' residuals",
       x = "Distance between two counties",
       y = "Correlation in residuals from model7",
       size = "Number of county-pairs")

That gives me this chart, which is more like the sort of thing we use in time series analysis.

I’m pleased it tells a similar story to the variogram (which is more of a black box to me) - the correlation between pairs of counties’ residuals is above zero for counties that are within 6 degrees/units of eachother, and stablises at a small negative number from about 20 units apart onwards.

My conclusion from this is that yes, there is still some spatial autocorrelation that should be taken into account. Particularly for those counties very close together (around 2 degree/units apart or less).

Now, there is a tricky problem with the shape of spatial autocorrelation. Even if you’re prepared to assume (as I am in this case) that it is symmetrical north-south and east-west (not always the case when it depends on eg wind), there are differing shapes of decay in the level of correlation, as a function of distance. Experts can apparently/allegedly judge which method is best by looking at the shape of the variogram, but I feel safest by trying all five correlation structures available to me and choosing the one with the lowest AIC.

# o help us decide the shape of the spatial autocorrelation
model8 <- list()

model8[[1]] <- gamm(per_gop ~ cpe + s(x, y),
               random = list(state_name = ~1 + cpe),
               correlation = corExp(form = ~x +y),
               weights = total_votes, family = quasibinomial, data = combined2)

model8[[2]] <- gamm(per_gop ~ cpe + s(x, y),
                random = list(state_name = ~1 + cpe),
                correlation = corGaus(form = ~x +y),
                weights = total_votes, family = quasibinomial, data = combined2)

model8[[3]] <- gamm(per_gop ~ cpe + s(x, y),
                random = list(state_name = ~1 + cpe),
                correlation = corLin(form = ~x +y),
                weights = total_votes, family = quasibinomial, data = combined2)

model8[[4]] <- gamm(per_gop ~ cpe + s(x, y),
                random = list(state_name = ~1 + cpe),
                correlation = corRatio(form = ~x +y),
                weights = total_votes, family = quasibinomial, data = combined2)

model8[[5]] <- gamm(per_gop ~ cpe + s(x, y),
                random = list(state_name = ~1 + cpe),
                correlation = corSpher(form = ~x +y),
                weights = total_votes, family = quasibinomial, data = combined2)

sapply(model8, \(m){round(AIC(m$lme), -1)})
AIC(model7$lme)

That gets me these results:

> sapply(model8, \(m){round(AIC(m$lme), -1)})
[1] 8230 8600 8590 8330 8580
> AIC(model7$lme)
[1] 8605.692

We see that all the models that adjust for spatial correlation are better than model7 (which doesn’t), but the models using corExp and corRatio are much better than the other three. The corExp model (model8[[1]]) is our new best model so far.

If we look at the t statistics for the coefficients of these five models, we see that for the first time our best model has a “non-significant” slope for cpe (ie depression), 1.695. Now, I’m not going to remove a variable from a complex mixed effects model like this because of a non-significant t statistic, but it is definitely note-worthy that the depression effect has become less important once we let it vary by state and do the best correction for spatial autocorrelation:

> sapply(model8, \(m){summary(m$gam)$p.t})
                 [,1]      [,2]      [,3]      [,4]      [,5]
(Intercept) -2.031622 -4.036009 -3.987122 -3.494438 -3.952872
cpe          1.695218  4.206850  4.193251  3.746362  4.167077

Also noteworthy though is that the other spatial correlation shapes still leave depression with significant t statistic.

Here’s the full report from the lme part of the best fit so far:

> summary(model8[[1]]$lme)
Linear mixed-effects model fit by maximum likelihood
  Data: data 
       AIC      BIC    logLik
  8231.487 8291.901 -4105.744

Random effects:
 Formula: ~Xr - 1 | g
 Structure: pdIdnot
            Xr1     Xr2     Xr3     Xr4     Xr5     Xr6     Xr7     Xr8
StdDev: 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243
            Xr9    Xr10    Xr11    Xr12    Xr13    Xr14    Xr15    Xr16
StdDev: 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243
           Xr17    Xr18    Xr19    Xr20    Xr21    Xr22    Xr23    Xr24
StdDev: 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243 6.68243
           Xr25    Xr26    Xr27
StdDev: 6.68243 6.68243 6.68243

 Formula: ~1 + cpe | state_name %in% g
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept)  2.702224 (Intr)
cpe         12.092468 -0.99 
Residual    66.289425       

Correlation Structure: Exponential spatial correlation
 Formula: ~x + y | g/state_name 
 Parameter estimate(s):
    range 
0.7289188 
Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects:  list(fixed) 
                 Value Std.Error   DF    t-value p-value
X(Intercept) -0.889436 0.4390918 3054 -2.0256268  0.0429
Xcpe          3.370112 1.9919865 3054  1.6918346  0.0908
Xs(x,y)Fx1    0.165268 0.1447102 3054  1.1420604  0.2535
Xs(x,y)Fx2    0.000906 0.1909150 3054  0.0047431  0.9962
 Correlation: 
           X(Int) Xcpe   X(,)F1
Xcpe       -0.988              
Xs(x,y)Fx1 -0.065  0.066       
Xs(x,y)Fx2 -0.131  0.141  0.510

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-5.2880800  0.2592565  0.5963373  1.0331947  4.6052452 

Number of Observations: 3107
Number of Groups: 
                g state_name %in% g 
                1                50 

I believe that Formula: ~Xr - 1 | g bit refers to the s(x,y) term from the GAM when brought into the linearised LME fit. The range of 0.73 refers to the exponential spatial correlation.

Race

Finally I’m ready to look at the question that’s of most substantive interest - does incuding race in the model make the “depression effect” go away, suggesting that depression is acting as a proxy for whiteness. I sourced data on county characteristics of various sorts from the UN Census Bureau. I considered four candidate variables:

white_alone which is the proportion of the county that describe themselves as white and not other race
white_all which is the proportion of the county that describe themselves as white, whether or not they also are of another race as well
hispanic proportion of county that describe themselves as hispanic
hispanic_multi proportion of county that describe themselves as hispanic and more races in addition

I deliberately left out the proportion of African-Americans in each county, assuming it would be very collinear with some combination of the others. If were seriously interested in how race worked in this election I would probably have included it anyway.

Looking at these four variables (without peeking at their relationship to vote) gives me this pairs plot:

which convinces me I should save a degree of freedom and drop the white_all variable from my modelling as containing virtually no extra information from the white_alone variable. Here’s the code to import that data and draw the pairs plot:

---------------------data on 'race'-------------

# county characteristis from US Census Bureau
# see https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2020-2023/CC-EST2023-ALLDATA.pdf
# for metadata

df <- "cc-est2023-alldata.csv"
if(!file.exists(df)){
  download.file("https://www2.census.gov/programs-surveys/popest/datasets/2020-2023/counties/asrh/cc-est2023-alldata.csv",
                destfile = df)
}

# key columns:
# TOT_POP total population
# WA_MALE "White alone" male
# WAC_MALE "white alone or in combination" male
# H_MALE Hispanic male
# HTOM_MALE Hispanic AND more races male

race <- read_csv(df) |>
  # just 2023 and TOTAL age group:
  filter(YEAR == 5 & AGEGRP == 0) |>
  mutate(white_alone = (WA_MALE + WA_FEMALE) / TOT_POP,
         white_all = (WAC_MALE + WAC_FEMALE) / TOT_POP,
         hispanic = (H_MALE + H_FEMALE) / TOT_POP,
         hispanic_multi = (HTOM_MALE + HTOM_FEMALE) / TOT_POP) |>
  mutate(county_fips = paste0(STATE, COUNTY)) |>
  select(white_alone:county_fips, CTYNAME)

 race |>
  select(-county_fips, -CTYNAME) |>
  ggpairs()

Next I wanted to look at the relationship of these race variables to the logit of vote for Trump, to check if a linearity assumption was going to be reasonable. That got me this chart, which seemed linear enough for me (for my purposes):

…produced with this code:

combined4 <- combined2 |>
  left_join(race, by = "county_fips")

# visual check that the counties joined correctly:
select(combined4, county_name, CTYNAME)

# check for linearity of relationships to the response variable

logit <- function(p){
  log(p / (1 -p))
}

combined4 |>
  mutate(logit_gop = logit(per_gop)) |>
  select(logit_gop, total_votes, `
         Depression incidence` = cpe, 
         `Proportion only white` = white_alone, 
         `Proportion only hispanic` = hispanic, 
         `Proportion hispanic plus another` = hispanic_multi) |>
  gather(variable, value, -logit_gop, -total_votes) |>
  ggplot(aes(y = logit_gop, x =value)) +
  geom_point(aes(size = total_votes), alpha = 0.5) +
  geom_smooth(method = "lm", aes(weight = total_votes)) +
  facet_wrap(~variable, scales = "free_x") +
  scale_x_continuous(label = percent) +
  labs(x = "Explanatory variable value",
       y = "logit of vote for Trump, 2024")

OK so now I fit a whole bunch of different models to be confident that individual decisions from me weren’t going to be leading to my final conclusions. I fit models with many of the combinations of using a rubber mat to deal with spatial autocorrelation, explicitly modelling the spatial autocorrelation, random or fixed slopes (ie varying by state) for cpe (depression), random or fixed slopes (ie varying by state) for hte race variables. Then I calculated the AIC of all these models and the t statistics for cpe (depression), which gets me this table, sorted with the best models (lowest AIC) at the top:

model	AIC	rubber_mat	random_cpe_slope	race	random_race_slope	SAC_fix	cpe_t_stat
15	5676.498	1	1	1	1	1	-1.612598
13	5904.910	1	1	1	0	1	-1.900880
10	6063.537	0	1	1	0	1	-1.581859
14	6241.311	1	0	1	0	1	-4.612077
11	6416.801	0	0	1	0	1	-2.996010
12	7136.013	0	0	1	0	0	5.347793
8	8231.487	1	1	0	0	1	1.695218
9	8246.613	0	1	0	0	1	2.162823

Lots of interesting things here, but most important:

The best models are definitely those that include race
The best model of all is also the most complex - random slopes for both race and depression, rubber mat, plus addressing the spatial autocorrelation (SAC) explicitly with an exponential correlation structure
The best models all have a negative sign for cpe, indicating that after controlling for race, it is the counties with less depression that were more likely to vote for Trump - a compelling change in narrative from looking at the data without controlling for race. But in the best model of all, depression isn’t very important.
One model that I describe in the code below as “definitely illegitimate because it makes no attempt at all to correct for spatial autocorrelation” is the one that gives the biggest positive effect (as in t statistics) for the depression impact on voting for Trump

I regard this as good evidence that in fact incidence of depression by county wasn’t important in driving the vote for Trump, but that race composition of counties probably was (or at least ‘more likely’ was). Of course, how that mechanism works, and whether ‘whiteness’ here is in fact standing in for something else, is beyond the scope of this blog post (already far too long) to explain.

Here’s the code that fit all those candidate models described above and generates the table:

# no rubber mat, no race, but does have spatial autocorrelation
model9 <- gamm(per_gop ~ cpe,
                    random = list(state_name = ~1 + cpe),
                    correlation = corExp(form = ~x +y),
                    weights = total_votes, family = quasibinomial, data = combined2)

# no rubber mat, but race:
model10 <- gamm(per_gop ~ cpe  + white_alone + hispanic + hispanic_multi,
               random = list(state_name = ~1 + cpe),
               correlation = corExp(form = ~x +y),
               weights = total_votes, family = quasibinomial, data = combined4)

# no random slopes or rubber mat, but race:
model11 <- gamm(per_gop ~ cpe  + white_alone + hispanic + hispanic_multi,
                random = list(state_name = ~1),
                correlation = corExp(form = ~x +y),
                weights = total_votes, family = quasibinomial, data = combined4)

# no random slope, rubber mat or spatial autocorrelation fix at all. this model
# is definitely illegitimate in that it makes no effort to fix for spatial issues.
model12 <- gamm(per_gop ~ cpe  + white_alone + hispanic + hispanic_multi,
                random = list(state_name = ~1),
                weights = total_votes, family = quasibinomial, data = combined4)


# rubber mat, race, random slope for cpe
model13 <- gamm(per_gop ~ cpe  + s(x, y) + white_alone + hispanic + hispanic_multi,
                random = list(state_name = ~1 + cpe),
                correlation = corExp(form = ~x +y),
                weights = total_votes, family = quasibinomial, data = combined4)

# rubber mat, race, only random intercept
model14 <- gamm(per_gop ~ cpe  + s(x, y) + white_alone + hispanic + hispanic_multi,
                random = list(state_name = ~1),
                correlation = corExp(form = ~x +y),
                weights = total_votes, family = quasibinomial, data = combined4)

# fullest model so far PLUS giving random slopes to the nuisance variables
# rubber mat, race, random slope for cpe & random slope for the two main
# race variables:
model15 <- gamm(per_gop ~ cpe  + s(x, y) + white_alone + hispanic + hispanic_multi,
                random = list(state_name = ~1 + cpe + white_alone + hispanic),
                correlation = corExp(form = ~x +y),
                weights = total_votes, family = quasibinomial, data = combined4)


# compare the no-rubber mat models (expect #15 to be best, with random slopes
# for CPE and race - the most flexibility)
tibble(model = 8:15,
       AIC = c(AIC(model8[[1]]$lme),
               AIC(model9$lme), 
               AIC(model10$lme), 
               AIC(model11$lme), 
               AIC(model12$lme), 
               AIC(model13$lme), 
               AIC(model14$lme), 
               AIC(model15$lme)),
       rubber_mat = c(1,0,0,0,0,1,1,1),
       random_cpe_slope = c(1,1,1,0,0,1,0,1),
       race = c(0,0, 1,1,1,1,1,1),
       random_race_slope = c(0,0,0,0,0,0,0,1),
       SAC_fix = c(1,1,1,1,0,1,1,1),
       cpe_p_value = c(
         summary(model8[[1]]$gam)$p.pv['cpe'],
         summary(model9$gam)$p.pv['cpe'],
         summary(model10$gam)$p.pv['cpe'],
         summary(model11$gam)$p.pv['cpe'],
         summary(model12$gam)$p.pv['cpe'],
         summary(model13$gam)$p.pv['cpe'],
         summary(model14$gam)$p.pv['cpe'],
         summary(model15$gam)$p.pv['cpe']),
       cpe_t_stat = c(
         summary(model8[[1]]$gam)$p.t['cpe'],
         summary(model9$gam)$p.t['cpe'],
         summary(model10$gam)$p.t['cpe'],
         summary(model11$gam)$p.t['cpe'],
         summary(model12$gam)$p.t['cpe'],
         summary(model13$gam)$p.t['cpe'],
         summary(model14$gam)$p.t['cpe'],
         summary(model15$gam)$p.t['cpe'])
       )  |>
  mutate(cpe_p_value = round(cpe_p_value, 4)) |>
  arrange(AIC)  |>
  # for space decided not to show this column, can just use t stat
  select(-cpe_p_value) |> 
  knitr::kable(format = "html") |> 
  kable_styling()|>
  writeClipboard()

Here’s the key results from the overall winning model:

> summary(model15$lme)
Linear mixed-effects model fit by maximum likelihood
  Data: data 
       AIC      BIC    logLik
  5676.498 5797.326 -2818.249

Random effects:
 Formula: ~Xr - 1 | g
 Structure: pdIdnot
             Xr1      Xr2      Xr3      Xr4      Xr5      Xr6      Xr7
StdDev: 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787
             Xr8      Xr9     Xr10     Xr11     Xr12     Xr13     Xr14
StdDev: 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787
            Xr15     Xr16     Xr17     Xr18     Xr19     Xr20     Xr21
StdDev: 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787
            Xr22     Xr23     Xr24     Xr25     Xr26     Xr27
StdDev: 4.947787 4.947787 4.947787 4.947787 4.947787 4.947787

 Formula: ~1 + cpe + white_alone + hispanic | state_name %in% g
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr                
(Intercept)  1.503864 (Intr) cpe    wht_ln
cpe          6.544726 -0.761              
white_alone  1.055536 -0.437 -0.188       
hispanic     1.719388 -0.134 -0.152  0.180
Residual    43.651012                     

Correlation Structure: Exponential spatial correlation
 Formula: ~x + y | g/state_name 
 Parameter estimate(s):
    range 
0.8315562 
Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects:  list(fixed) 
                    Value Std.Error   DF   t-value p-value
X(Intercept)    -2.395289  0.271082 3051 -8.836033  0.0000
Xcpe            -1.862160  1.155263 3051 -1.611892  0.1071
Xwhite_alone     3.678207  0.191161 3051 19.241375  0.0000
Xhispanic       -0.544371  0.327396 3051 -1.662729  0.0965
Xhispanic_multi  1.751862  4.307237 3051  0.406725  0.6842
Xs(x,y)Fx1       0.202288  0.093536 3051  2.162664  0.0306
Xs(x,y)Fx2       0.429865  0.131173 3051  3.277087  0.0011
 Correlation: 
                X(Int) Xcpe   Xwht_l Xhspnc Xhspn_ X(,)F1
Xcpe            -0.761                                   
Xwhite_alone    -0.461 -0.164                            
Xhispanic       -0.178 -0.053  0.161                     
Xhispanic_multi -0.002 -0.114  0.101 -0.278              
Xs(x,y)Fx1      -0.051  0.030  0.028 -0.032  0.157       
Xs(x,y)Fx2      -0.121  0.101  0.083  0.000 -0.081  0.350

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-6.1786290  0.2312745  0.4830700  0.8228585  5.5847101 

Number of Observations: 3107
Number of Groups: 
                g state_name %in% g 
                1                50 

Looking at that table of fixed effects, we see the white_alone variable is definitely significant, as is the rubber mat s(x,y) spatial correlation absorber. So we could tentatively conclude that this suggests that whiteness strongly contributed to vote for Trump; that depression incidence didn’t contribute anywhere near as much; and there was a strong spatial correlation not explained by either whiteness or depression.

Looking for a way to summarise all this I came up with this chart of the partial relationship of whiteness and of depression incidence to vote for Trump; after controlling for the other things in the final model:

That was produced with this code, which involved fitting a new model to produce the residuals after controlling for race but not controlling for depression (once combination that hadn’t yet been done in the frenzy of model-fitting above):

#--------------------partial charts-------------------
# model 8[[1]] is the full model except for race. we also need a full model except
# for depression (cpe). then we will use the residuals from each for some charts

model17 <- gamm(per_gop ~ s(x, y) + white_alone + hispanic + hispanic_multi,
                random = list(state_name = ~1 + white_alone + hispanic),
                correlation = corExp(form = ~x +y),
                weights = total_votes, family = quasibinomial, data = combined4)

# Difference between this and model 15 is no depression variable
c(AIC(model17$lme), AIC(model15$lme))

p5 <- combined4 |>
  mutate(after_cpe = residuals(model8[[1]]$gam, type = "response"))  |>
  ggplot(aes(x = white_alone, y = after_cpe)) +
  geom_point(alpha = 0.5, aes(size = total_votes)) +
  geom_smooth(aes(weight = total_votes), method = "lm") +
  scale_x_continuous(label = percent) +
  scale_y_continuous(label = percent, limits = c(-0.4, 0.6)) +
  scale_size_area(label = comma_format(suffix = "m", scale = 1e-6)) +
  labs(x = "Percentage of county that is 'white' as its only race",
       y = "Residual vote for Trump",
       subtitle = "After controlling for counties' depression incidence",
       title = "Partial relationship of 'whiteness' and Trump vote",
       size = "Total votes, 2024:")
  
  
p6 <- combined4 |>
  mutate(after_race = residuals(model17$gam, type = "response"))  |>
  ggplot(aes(x = cpe, y = after_race)) +
  geom_point(alpha = 0.5, aes(size = total_votes)) +
  geom_smooth(aes(weight = total_votes), method = "lm") +
  scale_x_continuous(label = percent) +
  scale_y_continuous(label = percent, limits = c(-0.4, 0.6)) +
  scale_size_area(label = comma_format(suffix = "m", scale = 1e-6)) +
  labs(x = "Incidence of diagnosed depression in each country",
       y = "Residual vote for Trump",
       subtitle = "After controlling for counties' racial composition",
       title = "Partial relationship of depression incidence and Trump vote",
       size = "Total votes, 2024:",
       caption = "Source: voting from tonmcg, depression from CDC, race from US Census Bureau; analysis by freerangestats.info")

print(p5 + p6)

That’s it for today. I hope that’s of interest for someone, either as a messy but realistic modelling strategy case study, or for those interested in the specific issue of depression and the 2024 election.

Revisiting depression incidence by county and vote for Trump

At a glance:

Move from gam to gamm

Random slopes

Better spatial autocorrelation

Race

Move from `gam` to `gamm`