Can you send the error messages again? I'm not seeing them.

On Mon, Jul 13, 2015 at 2:45 AM, shivamverma <shivam13ve...@gmail.com>
wrote:

> Hi
>
> I am running Spark 1.4 in Standalone mode on top of Hadoop 2.3 on a CentOS
> node. I am trying to run grid search on an RF classifier to classify a
> small
> dataset using the pyspark.ml.tuning module, specifically the
> ParamGridBuilder and CrossValidator classes. I get the following error when
> I try passing a DataFrame of Features-Labels to CrossValidator:
>
>
>
> I tried the following code, using the dataset given in Spark's CV
> documentation for  cross validator
> <
> https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.tuning.CrossValidator
> >
> . I also pass the DF through a StringIndexer transformation for the RF:
>
>
>
> Note that the above dataset works on logistic regression. I have also tried
> a larger dataset with sparse vectors as features (which I was originally
> trying to fit) but received the same error on RF.
> My guess is that there is an issue with how
> BinaryClassificationEvaluator(self, rawPredictionCol="rawPrediction",
> labelCol="label", metricName="areaUnderROC") interprets the 'rawPredict'
> column - with LR, the rawPredictionCol is a list/vector, whereas with RF,
> the prediction column is a double.
> Is it an issue with the evaluator, or is there something else that I'm
> missing?
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-issue-with-running-CrossValidator-with-RandomForestClassifier-on-dataset-tp23791.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to