Re: [How-To] Migrating from mllib.tree.DecisionTree to ml.regression.DecisionTreeRegressor

OBones Thu, 15 Jun 2017 08:07:30 -0700

OBones wrote:

So, I tried to rewrite my sample code using the ml package and it isvery much easier to use, no need for the LabeledPoint transformation.Here is the code I came up with:
    val dt = new DecisionTreeRegressor()
      .setPredictionCol("Y")
      .setImpurity("variance")
      .setMaxDepth(30)
      .setMaxBins(32)

    val model = dt.fit(data)

    println(model.toDebugString)
    println(model.featureImportances.toString)
However, I cannot find a way to specify which columns are features,which ones are categorical and how many categories they have, like Iused to do with the mllib package.

Well, further research led me to adding the following code to indicatewhich columns are categorical:

val X1Attribute =NominalAttribute.defaultAttr.withName("X1").withValues("0", "1").toMetadataval X2Attribute =NominalAttribute.defaultAttr.withName("X2").withValues("0", "1","2").toMetadata

val dataWithAttributes = data.withColumn("X1", $"X1".as("X1",X1Attribute)).withColumn("X2", $"X2".as("X2", X2Attribute))


but when I run this:

  val model = dt.fit(dataWithAttributes )

I get the following error:
java.lang.IllegalArgumentException: Field "features" does not exist.

It makes sense because I am yet to find a way to specify which columnsare features.I also have to figure out what the label column is and what differencesit has from the prediction column as only the latter was used with themllib package.



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [How-To] Migrating from mllib.tree.DecisionTree to ml.regression.DecisionTreeRegressor

Reply via email to