Can you send the error messages again? I'm not seeing them. On Mon, Jul 13, 2015 at 2:45 AM, shivamverma <shivam13ve...@gmail.com> wrote:
> Hi > > I am running Spark 1.4 in Standalone mode on top of Hadoop 2.3 on a CentOS > node. I am trying to run grid search on an RF classifier to classify a > small > dataset using the pyspark.ml.tuning module, specifically the > ParamGridBuilder and CrossValidator classes. I get the following error when > I try passing a DataFrame of Features-Labels to CrossValidator: > > > > I tried the following code, using the dataset given in Spark's CV > documentation for cross validator > < > https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.tuning.CrossValidator > > > . I also pass the DF through a StringIndexer transformation for the RF: > > > > Note that the above dataset works on logistic regression. I have also tried > a larger dataset with sparse vectors as features (which I was originally > trying to fit) but received the same error on RF. > My guess is that there is an issue with how > BinaryClassificationEvaluator(self, rawPredictionCol="rawPrediction", > labelCol="label", metricName="areaUnderROC") interprets the 'rawPredict' > column - with LR, the rawPredictionCol is a list/vector, whereas with RF, > the prediction column is a double. > Is it an issue with the evaluator, or is there something else that I'm > missing? > > Thanks! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-issue-with-running-CrossValidator-with-RandomForestClassifier-on-dataset-tp23791.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >