Example: Binary data imputation
Schizophrenia data from NIMH Study
In this example, we use the binary responses in Schizophrenia data from the NIMH study, i.e. schizob, which is included in the package. As described in Wang and Liu (2022) (arxiv 2203.02771), the original response variable was in numerical scale. The binary response was created using a cut-point of 3.5. The data is in a wide format. Please note that all binary variables which is set to be imputed should be converted into factor variables.
data(schizob)
head(schizob) %>% kbl(align = "c") %>% 
  kable_classic(full_width = F, html_font = "Cambria") %>%
  column_spec(1, width = "2cm") %>%
  add_header_above(c(" " = 1, 
                     "Responses at the baseline, week 1, week 3, and week 6" = 4))
| 
 
Responses at the baseline, week 1, week 3, and week 6
 
 | 
||||
|---|---|---|---|---|
| tx | y0 | y1 | y3 | y6 | 
| 1 | 1 | 0 | 0 | 1 | 
| 1 | 1 | 0 | 0 | 0 | 
| 1 | 1 | 0 | 0 | NA | 
| 1 | 0 | 0 | 0 | 0 | 
| 0 | 1 | 1 | 1 | 1 | 
| 1 | 1 | 1 | 1 | 1 | 
Missing pattern is displayed in the following plot:
Figure 1. Missing pattern of Schizophrenia data
To impute missing values with logit model, we can set up family argument, say, family = binomial(link = "logit"):
test = remiod(formula = y6 ~ tx + y0 + y1 + y3, data=schizob, 
              family = binomial(link = "logit"),
              trtvar = 'tx', algorithm = "jags", method = "MAR", 
              n.iter = 0, warn = FALSE, mess = FALSE) 
print(test$mc.mar$models)
##                   y6                   y3                   y1 
## "glm_binomial_logit" "glm_binomial_logit" "glm_binomial_logit" 
##                   y0 
## "glm_binomial_logit"
However, if probit models are the choice, argument models must be set to accompany with family argument, like the following:
test.probit = remiod(formula = y6 ~ tx + y0 + y1 + y3, data=schizob, 
                    family = binomial(link = "probit"),
                     models = c(y0="glm_binomial_probit",y1="glm_binomial_probit",
                                y3="glm_binomial_probit"),
                     trtvar = 'tx', algorithm = "jags", method = "MAR", 
                     n.iter = 0, warn = FALSE, mess = FALSE) 
print(test.probit$mc.mar$models)
##                    y6                    y3                    y1 
## "glm_binomial_probit" "glm_binomial_probit" "glm_binomial_probit" 
##                    y0 
## "glm_binomial_probit"
Let’s run the Probit model with an adaptation of 10000 and 2000 iterations for 4 chains. Chains run in parallel, which is set through doFuture package:
registerDoFuture()
plan(multisession(workers = 4))
bp.test = remiod(formula=y6 ~ tx + y0 + y1 + y3, data=schizob, 
                 family = binomial(link="probit"),
                 models = c(y0="glm_binomial_probit",y1="glm_binomial_probit",
                            y3="glm_binomial_probit"),
                 n.iter = 2000,  n.chains = 4, n.adapt = 10000, thin=1, mess=TRUE, 
                 warn=FALSE, algorithm = "jags", trtvar = 'tx', method="MAR")
plan(sequential)
The following plot show the estimated intervals as shaded areas under the posterior density curves for the parameters of treatment variable tx in imputation models:
- 
beta[2]is the coefficient oftxin imputation modely6 ~ tx + y0 + y1 + y3; - 
alpha[2]is the coefficient oftxin imputation modely3 ~ tx + y0 + y1; - 
alpha[6]is the coefficient oftxin imputation modely1 ~ tx + y0; - 
alpha[9]is the coefficient oftxin imputation modely0 ~ tx. 
The specified set of parameters can be submitted through argument subset with keyword selected_parms (alternatively, keyword selected_vars, which will be available in the new release, can also be used):
pms = c("beta[2]","alpha[2]","alpha[6]","alpha[9]")
mcsub = remiod:::get_subset(object = bp.test$mc.mar, 
                            subset=c(selected_parms = list(pms)))
color_scheme_set("purple")
mcmc_areas(
  mcsub, 
  pars = pms,
  prob = 0.95, # 95% intervals
  prob_outer = 0.99, # 99%
  point_est = "mean"
)
Figure 2. Intervals under the posterior density curves for coefficients of tx in imputation models