subject:"GradientBoostedTrees.trainRegressor with categoricalFeaturesInfo"

Re: GradientBoostedTrees.trainRegressor with categoricalFeaturesInfo

2015-05-20 Thread Don Drake

JIRA created: https://issues.apache.org/jira/browse/SPARK-7781 Joseph, I agree, I'm debating removing this feature altogether, but I'm putting the model through its paces. Thanks. -Don On Wed, May 20, 2015 at 7:52 PM, Joseph Bradley wrote: > One more comment: That's a lot of categories for a

Re: GradientBoostedTrees.trainRegressor with categoricalFeaturesInfo

2015-05-20 Thread Joseph Bradley

One more comment: That's a lot of categories for a feature. If it makes sense for your data, it will run faster if you can group the categories or split the 1895 categories into a few features which have fewer categories. On Wed, May 20, 2015 at 3:17 PM, Burak Yavuz wrote: > Could you please op

Re: GradientBoostedTrees.trainRegressor with categoricalFeaturesInfo

2015-05-20 Thread Burak Yavuz

Could you please open a JIRA for it? The maxBins input is missing for the Python Api. Is it possible if you can use the current master? In the current master, you should be able to use trees with the Pipeline Api and DataFrames. Best, Burak On Wed, May 20, 2015 at 2:44 PM, Don Drake wrote: > I

GradientBoostedTrees.trainRegressor with categoricalFeaturesInfo

2015-05-20 Thread Don Drake

I'm running Spark v1.3.1 and when I run the following against my dataset: model = GradientBoostedTrees.trainRegressor(trainingData, categoricalFeaturesInfo=catFeatu res, maxDepth=6, numIterations=3) The job will fail with the following message: Traceback (most recent call last): File "/Users/dr