One of our data scientists is interested in using Spark to improve performance 
in some random forest binary classifications, but isn't getting good enough 
results from MLlib's implementation of the random forest compared to R's 
randomforest library with the available parameters. She suggested that if she 
could tune the vote rate of the forest (how many trees are required to vote 
true to cause a categorization) she might be able to reach the false positive 
and true positive targets for the project.

Is there any way to set the vote rate for a random forest in Spark 1.5.2? I 
don't see any such option in the trainClassifier 
API<https://spark.apache.org/docs/1.5.2/api/scala/index.html#org.apache.spark.mllib.tree.RandomForest$>.

Thanks,

-- Matthew

Reply via email to