Example: Binary data imputation
Schizophrenia data from NIMH Study
In this example, we use the binary responses in Schizophrenia
data from the NIMH study, i.e. schizob
, which is included in the package. As described in Wang and Liu (2022) (arxiv 2203.02771), the original response variable was in numerical scale. The binary response was created using a cut-point of 3.5. The data is in a wide format. Please note that all binary variables which is set to be imputed should be converted into factor
variables.
data(schizob)
head(schizob) %>% kbl(align = "c") %>%
kable_classic(full_width = F, html_font = "Cambria") %>%
column_spec(1, width = "2cm") %>%
add_header_above(c(" " = 1,
"Responses at the baseline, week 1, week 3, and week 6" = 4))
Responses at the baseline, week 1, week 3, and week 6
|
||||
---|---|---|---|---|
tx | y0 | y1 | y3 | y6 |
1 | 1 | 0 | 0 | 1 |
1 | 1 | 0 | 0 | 0 |
1 | 1 | 0 | 0 | NA |
1 | 0 | 0 | 0 | 0 |
0 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 |
Missing pattern is displayed in the following plot:
To impute missing values with logit model, we can set up family
argument, say, family = binomial(link = "logit")
:
test = remiod(formula = y6 ~ tx + y0 + y1 + y3, data=schizob,
family = binomial(link = "logit"),
trtvar = 'tx', algorithm = "jags", method = "MAR",
n.iter = 0, warn = FALSE, mess = FALSE)
print(test$mc.mar$models)
## y6 y3 y1
## "glm_binomial_logit" "glm_binomial_logit" "glm_binomial_logit"
## y0
## "glm_binomial_logit"
However, if probit models are the choice, argument models
must be set to accompany with family
argument, like the following:
test.probit = remiod(formula = y6 ~ tx + y0 + y1 + y3, data=schizob,
family = binomial(link = "probit"),
models = c(y0="glm_binomial_probit",y1="glm_binomial_probit",
y3="glm_binomial_probit"),
trtvar = 'tx', algorithm = "jags", method = "MAR",
n.iter = 0, warn = FALSE, mess = FALSE)
print(test.probit$mc.mar$models)
## y6 y3 y1
## "glm_binomial_probit" "glm_binomial_probit" "glm_binomial_probit"
## y0
## "glm_binomial_probit"
Let’s run the Probit model with an adaptation of 10000 and 2000 iterations for 4 chains. Chains run in parallel, which is set through doFuture
package:
registerDoFuture()
plan(multisession(workers = 4))
bp.test = remiod(formula=y6 ~ tx + y0 + y1 + y3, data=schizob,
family = binomial(link="probit"),
models = c(y0="glm_binomial_probit",y1="glm_binomial_probit",
y3="glm_binomial_probit"),
n.iter = 2000, n.chains = 4, n.adapt = 10000, thin=1, mess=TRUE,
warn=FALSE, algorithm = "jags", trtvar = 'tx', method="MAR")
plan(sequential)
The following plot show the estimated intervals as shaded areas under the posterior density curves for the parameters of treatment variable tx
in imputation models:
-
beta[2]
is the coefficient oftx
in imputation modely6 ~ tx + y0 + y1 + y3
; -
alpha[2]
is the coefficient oftx
in imputation modely3 ~ tx + y0 + y1
; -
alpha[6]
is the coefficient oftx
in imputation modely1 ~ tx + y0
; -
alpha[9]
is the coefficient oftx
in imputation modely0 ~ tx
.
The specified set of parameters can be submitted through argument subset
with keyword selected_parms
(alternatively, keyword selected_vars
, which will be available in the new release, can also be used):
pms = c("beta[2]","alpha[2]","alpha[6]","alpha[9]")
mcsub = remiod:::get_subset(object = bp.test$mc.mar,
subset=c(selected_parms = list(pms)))
color_scheme_set("purple")
mcmc_areas(
mcsub,
pars = pms,
prob = 0.95, # 95% intervals
prob_outer = 0.99, # 99%
point_est = "mean"
)