[ https://issues.apache.org/jira/browse/SPARK-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley closed SPARK-13291. ------------------------------------- Resolution: Duplicate > Numerical models should preserve label attributes > ------------------------------------------------- > > Key: SPARK-13291 > URL: https://issues.apache.org/jira/browse/SPARK-13291 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 1.6.0 > Reporter: Piotr Smolinski > Priority: Minor > > I tried building a simple pipeline for Random Forest classification. The > predictors are some doubles, some ints and some strings. The response is > string. The RFormula seems to be a perfect candidate. RFormulaModel produces > nicely *labelCol* column with StringIndexer derived metadata and > RandomForestClassificationModel converts the *featuresCol* to > *predictionCol*. The problem is that there is no way to convert the > *predictionCol* (which is factor index) back to the label. The metadata > created by StringIndexer is lost. > The numerical models should create the *predictionCol* columns with metadata > seen on the *labelCol* column during the model fitting. > Preserving metadata allows for example to pipeline RFormula, > RandomForestClassifier and IndexToString. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org