Bootstrap Simulation for model prediction
Source:vignettes/Bootstrap_Prediction.Rmd
Bootstrap_Prediction.Rmd
You can make a bootstrap simulation for model prediction. When writing bootPredict function and this vignette, I was inspired from package finalfit by Ewen Harrison. For example, you can predict survival after diagnosis of breast cancer.
library(autoReg)
library(dplyr) # for use `%>%`
: 'dplyr'
Attaching package'package:stats':
The following objects are masked from
filter, lag'package:base':
The following objects are masked from
intersect, setdiff, setequal, uniondata(GBSG2,package="TH.data")
head(GBSG2)
horTh age menostat tsize tgrade pnodes progrec estrec time cens1 no 70 Post 21 II 3 48 66 1814 1
2 yes 56 Post 12 II 7 61 77 2018 1
3 yes 58 Post 35 II 9 52 271 712 1
4 yes 59 Post 17 II 4 60 29 1807 1
5 no 73 Post 35 II 1 26 65 772 1
6 no 32 Pre 57 III 24 0 13 448 1
Data GBGS2
in TH.data package is a data frame containing
the observations from the German Breast Cancer Study Group 2. In this
data, the survival status of patients is coded as 0 or 1 in the variable
cens
. Whether the patient receive the hormonal therapy or
not is recorded as ‘no’ or ‘yes’ in variable horTh
. The
number of positive lymph nodes are recoded in pnodes. You can make a
logistic regression model with the following R code.
$cens.factor=factor(GBSG2$cens,labels=c("Alive","Died"))
GBSG2=glm(cens.factor~horTh+pnodes+menostat,data=GBSG2,family="binomial")
fitsummary(fit)
:
Callglm(formula = cens.factor ~ horTh + pnodes + menostat, family = "binomial",
data = GBSG2)
:
CoefficientsPr(>|z|)
Estimate Std. Error z value -0.79237 0.15183 -5.219 1.80e-07 ***
(Intercept) -0.47782 0.17578 -2.718 0.00656 **
horThyes 0.10853 0.01818 5.970 2.38e-09 ***
pnodes 0.29375 0.16905 1.738 0.08228 .
menostatPost ---
: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Signif. codes
for binomial family taken to be 1)
(Dispersion parameter
: 939.68 on 685 degrees of freedom
Null deviance: 887.36 on 682 degrees of freedom
Residual deviance: 895.36
AIC
: 4 Number of Fisher Scoring iterations
You can make a publication-ready table with the following R command.
Dependent: cens.factor |
|
Alive (N=387) |
Died (N=299) |
OR (multivariable) |
---|---|---|---|---|
horTh |
no |
235 (60.7%) |
205 (68.6%) |
|
yes |
152 (39.3%) |
94 (31.4%) |
0.62 (0.44-0.88, p=.007) |
|
pnodes |
Mean ± SD |
3.8 ± 4.6 |
6.5 ± 6.1 |
1.11 (1.08-1.16, p<.001) |
menostat |
Pre |
171 (44.2%) |
119 (39.8%) |
|
Post |
216 (55.8%) |
180 (60.2%) |
1.34 (0.96-1.87, p=.082) |
|
You can draw a plot summarizing the model.
modelPlot(fit)
For bootstrapping simulation, you can make new data with the following R code.
newdata=expand.grid(horTh=factor(c(1,2),labels=c("no","yes")),
pnodes=1:51,
menostat=factor(c(1,2),labels=c("Pre","Post")))
You can make a bootstrapping simulation with bootPredict() function. You can set the number of simulation by adjusting R argument.
=bootPredict(fit,newdata,R=500)
dfhead(df)
horTh pnodes menostat estimate lower upper1 no 1 Pre 0.3354033 0.2742587 0.4036184
2 yes 1 Pre 0.2383652 0.1664526 0.3265529
3 no 2 Pre 0.3600105 0.3029007 0.4240014
4 yes 2 Pre 0.2586235 0.1838117 0.3429236
5 no 3 Pre 0.3853762 0.3288990 0.4493319
6 yes 3 Pre 0.2799707 0.2025350 0.3646537
With this result, you can draw a plot showing bootstrapping prediction of breast cancer.