We are close to integrating our first set of survival analysis models with our ETL process. The latter is a set of scripts written in Ruby, using the Sequel gem that take data from our operational PostgreSQL database, transforms and loads the results to a set of fact and dimension tables in the data warehouse, which is also in PostgreSQL. I explored, for some time, for ways to deploy predictive models within applications. It seemed liked the opinions are broadly divided into two schools of thought:
After some discussion among ourselves, we opted for the RSRuby. It lets you call the standard R functions using Ruby, once you create a "wrapper" around R, which is an instance of class RSRuby. Here are some blog entries to get started on RSRuby, which I found pretty useful.
However, when we develop a Cox model, the coxph() function demands that the response variable be a survival object, returned by the Surv() function. This entry shows how to use lm() in RSRuby; however, lm() needs only a simply vector, or, a column of a data frame as the response variable, making things easier. Although RSRuby has a built-in class for data frames, none for survival objects, and the documentation is scarce.
We, therefore, chose to create our own application-level package in R (using the concepts of generic functions and method dispatch), that takes the data as simple vectors or data frames, and returns the fitted survival function for a given child or a cohort group as a data frame. I found this tutorial on how to use by Friedrich Leisch an excellent one. It works fine for our volume of data. Until RSRuby matures more, this seems like a nice workaround.
- Some people suggest using R only for building and experimenting with the model offline, and then re-implement the predictive algorithms in C++, or Java, or whatever the main application is in.
- The other school is in favor of maintaining the model in R, and integrating it with the app by using some sort of "bridge", and it seemed like some of the most popular "bridges" were rApache, Rserve, PMML and RSRuby.
After some discussion among ourselves, we opted for the RSRuby. It lets you call the standard R functions using Ruby, once you create a "wrapper" around R, which is an instance of class RSRuby. Here are some blog entries to get started on RSRuby, which I found pretty useful.
However, when we develop a Cox model, the coxph() function demands that the response variable be a survival object, returned by the Surv() function. This entry shows how to use lm() in RSRuby; however, lm() needs only a simply vector, or, a column of a data frame as the response variable, making things easier. Although RSRuby has a built-in class for data frames, none for survival objects, and the documentation is scarce.
We, therefore, chose to create our own application-level package in R (using the concepts of generic functions and method dispatch), that takes the data as simple vectors or data frames, and returns the fitted survival function for a given child or a cohort group as a data frame. I found this tutorial on how to use by Friedrich Leisch an excellent one. It works fine for our volume of data. Until RSRuby matures more, this seems like a nice workaround.
No comments:
Post a Comment