Re: ML Random Forest Classifier

James Hammerton Mon, 11 Apr 2016 02:42:14 -0700

To add a bit more detail perhaps something like this might work:

package org.apache.spark.ml
>
>
> import org.apache.spark.ml.classification.RandomForestClassificationModel
>
> import org.apache.spark.ml.classification.DecisionTreeClassificationModel
>
> import org.apache.spark.ml.classification.LogisticRegressionModel
>
> import org.apache.spark.mllib.tree.model.{ RandomForestModel =>
> OldRandomForestModel }
>
> import org.apache.spark.ml.classification.RandomForestClassifier
>
>
> object RandomForestModelConverter {
>
>
>   def fromOld(oldModel: OldRandomForestModel, parent:
> RandomForestClassifier = null,
>
>     categoricalFeatures: Map[Int, Int], numClasses: Int, numFeatures: Int
> = -1): RandomForestClassificationModel = {
>
>     RandomForestClassificationModel.fromOld(oldModel, parent,
> categoricalFeatures, numClasses, numFeatures)
>
>   }
>
>
>   def toOld(newModel: RandomForestClassificationModel):
> OldRandomForestModel = {
>
>     newModel.toOld
>
>   }
>
> }
>


Regards,

James


On 11 April 2016 at 10:36, James Hammerton <ja...@gluru.co> wrote:

> There are methods for converting the dataframe based random forest models
> to the old RDD based models and vice versa. Perhaps using these will help
> given that the old models can be saved and loaded?
>
> In order to use them however you will need to write code in the
> org.apache.spark.ml package.
>
> I've not actually tried doing this myself but it looks as if it might work.
>
> Regards,
>
> James
>
> On 11 April 2016 at 10:29, Ashic Mahtab <as...@live.com> wrote:
>
>> Hello,
>> I'm trying to save a pipeline with a random forest classifier. If I try
>> to save the pipeline, it complains that the classifier is not Writable, and
>> indeed the classifier itself doesn't have a write function. There's a pull
>> request that's been merged that enables this for Spark 2.0 (any dates
>> around when that'll release?). I am, however, using the Spark Cassandra
>> Connector which doesn't seem to be able to create a CqlContext with spark
>> 2.0 snapshot builds. Seeing that ML Lib's random forest classifier supports
>> storing and loading models, is there a way to create a Spark ML pipeline in
>> Spark 1.6 with a random forest classifier that'll allow me to store and
>> load the model? The model takes significant amount of time to train, and I
>> really don't want to have to train it every time my application launches.
>>
>> Thanks,
>> Ashic.
>>
>
>

Re: ML Random Forest Classifier

Reply via email to