Friday, December 21, 2012

Accelerated Failure Time Models

I continued my experiments with survival analysis, this time with Accelerated Failure Time Models. Although the Cox-Snell residuals, as explained in this post, indicated that the Cox model was a pretty good match, it works on the Proportional Hazards assumption, which holds only approximately for our data, and I was interested in seeing if we can do away with this assumption. The Accelerated Failure Time Model stands on the assumption that the event times (in our case, actual times when kids exit from care) come from a theoretical distribution, like Weibull, or Log-logistic, exponential etc. The graphical test to check if the exit times follow Weibull distribution (I mean, apart from drawing a histogram of exit times with equal bucket widths, which I did, too) is to plot log(-log(S(t))) against log(t), where S(t) is the survival function estimated by Kaplan-Meier, and it should give a straight line (an explanation is here). What we got was roughly a straight line. The equivalent check for the Log-logistic distribution is to plot log((1-S(t))/S(t)) against log(t), which should be a straight line if the assumption holds, and the curve I got was almost quadratic, so I pursued Weibull. I used the survreg function from the survival library. For prediction using the AFT Weibull model, I used the generic predict function of R, computing the quantiles, and plotting them in a reversed way, using ideas as in this lecture.

However, when I did the graphical check with Cox-Snell residuals and their cumulative hazards, as I did for the Cox-model, it seemed that the Cox model was a better fit. The reason was the estimated survival probabilities of the children who exited, as computed on their actual exit times, did not follow a uniform distribution very well, and the graphical check with Cox-Snell residual stands on the assumption that it would.

However, when we plot (one plot per entry cohort year, which is 2008, 2009, 2010, 2011 or 2012) the three estimated survival functions by Kaplan-Meier, Cox regression and AFT regression (for the last two, we apply the model on an "average kid", i.e., a fictitious kid with all covariate values set to their respective means), the last two come pretty close, and both remain pretty close to the one given by Kaplan-Meier for the older entry cohorts; and farther from the one given by Kaplan-Meier for the more recent entry cohorts. This is actually the way we want it to be; the reason being, for the older cohorts, most (93-95%) of kids have exited care, but for the more recent cohorts, only very few have; so, what Kaplan-Meier gives is essentially a biased estimate, based mostly on the kids who have already left, and for those kids, the length of stay in care for 2012 entry cohort, as of today, can't be more than 350 days, and so on. However, the pattern seen from the older cohorts establishes our trust in the model, and we expect the Kaplan-Meier estimate and the Cox-AFT estimates will come closer to each other as more kids from the recent cohorts will exit from care and the model will get updated. 

No comments:

Post a Comment