[ 
https://issues.apache.org/jira/browse/SPARK-24712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529746#comment-16529746
 ] 

Marco Gaido commented on SPARK-24712:
-------------------------------------

The problem is that you have not set the label on the evaluator you are passing 
to {{TrainValidationSplit}}. Please set it there and it will work. I am closing 
this, feel free to reopen if you face a problem.

> TrainValidationSplit ignores label column name and forces to be "label"
> -----------------------------------------------------------------------
>
>                 Key: SPARK-24712
>                 URL: https://issues.apache.org/jira/browse/SPARK-24712
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Pablo J. Villacorta
>            Priority: Major
>
> When a TrainValidationSplit is fit on a Pipeline containing a ML model, the 
> labelCol property of the model is ignored, and the call to fit() will fail 
> unless the labelCol equals "label". As an example, the following pyspark code 
> only works when the variable labelColumnĀ is set to "label"
> {code:java}
> from pyspark.sql.functions import rand, randn
> from pyspark.ml.regression import LinearRegression
> labelColumn = "target"  # CHANGE THIS TO "label" AND THE CODE WORKS
> df = spark.range(0, 10).select(rand(seed=10).alias("uniform"), 
> randn(seed=27).alias(labelColumn))
> vectorAssembler = 
> VectorAssembler().setInputCols(["uniform"]).setOutputCol("features")
> lr = LinearRegression().setFeaturesCol("features").setLabelCol(labelColumn)
> mypipeline = Pipeline(stages = [vectorAssembler, lr])
> paramGrid = ParamGridBuilder()\
> .addGrid(lr.regParam, [0.01, 0.1])\
> .build()
> trainValidationSplit = TrainValidationSplit()\
> .setEstimator(mypipeline)\
> .setEvaluator(RegressionEvaluator())\
> .setEstimatorParamMaps(paramGrid)\
> .setTrainRatio(0.8)
> trainValidationSplit.fit(df)  # FAIL UNLESS labelColumn IS SET TO "label"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to