Item response theory (IRT) models estimate a latent underlying quantity for individual units (legislators, judges, students) based on a series of binary indicator variables for each unit (votes, decisions, test answers). The basic two parameter IRT model is given by:
\[ \begin{align} \text{Pr}(y_{ij} = 1) = \text{logit}^{-1}(\beta_j\theta_i - \alpha_j) \end{align} \]
Where \(\beta_j\) is the discrimination parameter for indicator \(j\), which determines how much a one or zero on this indicator tells us about subject (country, legislator, respondent, etc.) \(i\)’s latent construct \(\theta_i\) and \(\alpha_j\) is a baseline difficulty parameter for indicator \(j\) that \(\theta_i\) must overcome for us to observe a one.
MCMCpack
We’re going to fit a one dimensional IRT model on the UN Voting Data from Bailey, Strezhnev, and Voeten (2017), using countries as the unit of analysis. Following their analysis, the latent construct will be alignment with the US led liberal world order. Our indicators are UN General Assembly votes, and we’re going to limit our analysis to 2015 so we don’t have to deal with changes in alignment over time.
library(MCMCpack) # load first becuase MASS::select() conflicts with dplyr::select()
library(tidyverse)
library(countrycode)
# read in UN voting data
UN <- read.delim('UNVotesPublished.tab', sep = '\t', quote = '')
# subset just to voting data
UN <- UN %>% filter(year == 2015) %>% select(ccode, vote)
# reshape data and assign country names
votes <- plyr::ddply(UN, 'ccode', function(x) t(x$vote)) %>%
mutate(ccode = countrycode(ccode, origin = 'cown',
destination = 'country.name')) %>%
replace_na(list(ccode = 'Vietnam')) # Vietnam is the only failure to match
We’re going to treat abstentions as no votes, and absences as missing data (these data illustrate the limitations of canned code; a more accurate model could use a categorical logit instead of a binomial one because states often abstain for reasons separate from voting yea or nay).
# recode vote values and convert to matrix
data_mat <- votes %>%
select(-ccode) %>%
apply(2, function(x) ifelse(ifelse(x %in% 2:3, 0, x) >= 8, NA,
ifelse(x %in% 2:3, 0, x)))
# add rownames to votes matrix for presentation
rownames(data_mat) <- votes$ccode
The last step is to run our model. In order to identify the model, we need to impose some sort of restriction on the data. We can hard-code the alignment values of certain countries, constrain two or more countries’ alignments to opposite signs, or specify starting values for the MCMC sampler which place two countries on different sides of zero. Let’s constrain the signs of two countries we know to be opposed since that is less restrictive the the first option and simpler than the third one.
# fit IRT model, fixing Cuba and US to be opposite signs
irt <- MCMCirt1d(datamatrix = data_mat,
theta.constraints = list('Cuba' = '-', 'United States' = '+'))
The ggmcmc
package contains several useful functions for extracting data from mcmc
objects and plotting results. The ggs()
function collects information on MCMC samples into dataframes that can be used by functions like ggs_caterpillar()
to produce summary plots.
library(ggmcmc)
# get estimates for plottig
irt_dat <- ggs(irt)
# coefficient plot
ggs_caterpillar(irt_dat, family = 'theta')