To add a bit more detail perhaps something like this might work: package org.apache.spark.ml > > > import org.apache.spark.ml.classification.RandomForestClassificationModel > > import org.apache.spark.ml.classification.DecisionTreeClassificationModel > > import org.apache.spark.ml.classification.LogisticRegressionModel > > import org.apache.spark.mllib.tree.model.{ RandomForestModel => > OldRandomForestModel } > > import org.apache.spark.ml.classification.RandomForestClassifier > > > object RandomForestModelConverter { > > > def fromOld(oldModel: OldRandomForestModel, parent: > RandomForestClassifier = null, > > categoricalFeatures: Map[Int, Int], numClasses: Int, numFeatures: Int > = -1): RandomForestClassificationModel = { > > RandomForestClassificationModel.fromOld(oldModel, parent, > categoricalFeatures, numClasses, numFeatures) > > } > > > def toOld(newModel: RandomForestClassificationModel): > OldRandomForestModel = { > > newModel.toOld > > } > > } >
Regards, James On 11 April 2016 at 10:36, James Hammerton <ja...@gluru.co> wrote: > There are methods for converting the dataframe based random forest models > to the old RDD based models and vice versa. Perhaps using these will help > given that the old models can be saved and loaded? > > In order to use them however you will need to write code in the > org.apache.spark.ml package. > > I've not actually tried doing this myself but it looks as if it might work. > > Regards, > > James > > On 11 April 2016 at 10:29, Ashic Mahtab <as...@live.com> wrote: > >> Hello, >> I'm trying to save a pipeline with a random forest classifier. If I try >> to save the pipeline, it complains that the classifier is not Writable, and >> indeed the classifier itself doesn't have a write function. There's a pull >> request that's been merged that enables this for Spark 2.0 (any dates >> around when that'll release?). I am, however, using the Spark Cassandra >> Connector which doesn't seem to be able to create a CqlContext with spark >> 2.0 snapshot builds. Seeing that ML Lib's random forest classifier supports >> storing and loading models, is there a way to create a Spark ML pipeline in >> Spark 1.6 with a random forest classifier that'll allow me to store and >> load the model? The model takes significant amount of time to train, and I >> really don't want to have to train it every time my application launches. >> >> Thanks, >> Ashic. >> > >