Example: Continuous data imputation through GLM
Data from an antidepressant clinical trial
The example data can be found in DIA Missing Data webpage. Original data was from an antidepressant clinical trial with four treatments; two doses of an experimental medication, a positive control, and placebo. Hamilton 17-item rating scale for depression (HAMD17) was observed at baseline and its change scores at weeks 1, 2, 4, 6, and 8 ( Goldstein et al. 2004). To mask the real data Week-8 observations were removed. The example data is a sub-sample of the original data: two arms were created; the original placebo arm and a “drug arm” created by randomly selecting patients from the three non-placebo arms.
data(antidep)
head(antidep) %>% kbl(align = "c") %>%
kable_classic(full_width = F, html_font = "Cambria") %>%
column_spec(1:2, width = "2cm") %>%
add_header_above(c(" " = 1, " "=1, "Responses at the baseline, week 1, 2, 4, and 6" = 5))
Responses at the baseline, week 1, 2, 4, and 6
|
||||||
---|---|---|---|---|---|---|
PID | tx | y0 | y1 | y2 | y4 | y6 |
1503 | 1 | 32 | -11 | -12 | -13 | -15 |
1507 | 0 | 14 | -3 | 0 | -5 | -9 |
1509 | 1 | 21 | -1 | -3 | -5 | -8 |
1511 | 0 | 21 | -5 | -3 | -3 | -9 |
1513 | 1 | 19 | 5 | NA | NA | NA |
1514 | 0 | 21 | 2 | NA | NA | NA |
Missing pattern is displayed in the following plot:
The planned statistical method to analyze this endpoint was mixed-effects model with last-observation-carry-forward (LOCF) as the imputation method. In this example, missing values are imputed with GLM models. This is implemented through family
argument, say, family = gaussian()
(its default link is identity
). The same imputation setting is applied for imputing y2
and y4
, i.e. argument models
is set to be glm_gaussian_identity
. We run the GLM imputation model with an adaptation of 10000 and 2000 iterations for 4 chains. Chains run in parallel, which is set through doFuture
package:
registerDoFuture()
plan(multisession(workers = 4))
an.test = remiod(formula=y6 ~ tx + y0 + y1 + y2 + y4, data=antidep, family = gaussian(),
models = c(y2="glm_gaussian_identity",y4="glm_gaussian_identity"),
n.iter = 100000, n.chains = 4, n.adapt = 10000, thin=100,
algorithm = "jags", trtvar = 'tx', method="MAR", mess=TRUE, warn=FALSE)
plan(sequential)
The following plots show trace plots and the estimated intervals as shaded areas under the posterior density curves for the parameters of treatment variable tx
in imputation models:
-
beta[2]
is the coefficient oftx
in imputation modely6 ~ tx + y0 + y1 + y2 + y4
; -
alpha[2]
is the coefficient oftx
in imputation modely4 ~ tx + y0 + y1 + y2
; -
alpha[7]
is the coefficient oftx
in imputation modely2 ~ tx + y0 + y1
;
The specified set of parameters can be submitted through argument subset
with keyword selected_vars
(alternatively, keyword selected_parms
can be used):
mcsub = get_subset(object = an.test$mc.mar, subset=c(selected_vars = list("tx")))
color_scheme_set("purple")
mcmc_trace(mcsub, facet_args = list(ncol = 1, strip.position = "left"))
mcmc_areas(
mcsub,
prob = 0.95, # 95% intervals
prob_outer = 0.99,
point_est = "mean"
)
To obtain jump-to-reference analysis, we extract MI data with method="J2R"
, and pool analysis results with miAnalyze
:
j2r = extract_MIdata(object=an.test, method="J2R", M=1000, minspace=4)
res.j2r = miAnalyze(formula = y6 ~ y0 + tx, data = j2r, family = gaussian())
data.frame(res.j2r$Est.pool) %>% select(-6) %>%
mutate_if(is.numeric, format, digits=4,nsmall = 0) %>%
kbl(align = "c") %>%
kable_classic(full_width = F, html_font = "Cambria")
Estimate | SE | CI.low | CI.up | t | |
---|---|---|---|---|---|
(Intercept) | 1.0433 | 1.82704 | -2.5376 | 4.6243 | 0.5711 |
y0 | -0.3292 | 0.09697 | -0.5192 | -0.1391 | -3.3945 |
tx | -2.3558 | 1.04973 | -4.4133 | -0.2984 | -2.2442 |