OBones wrote:
So, I tried to rewrite my sample code using the ml package and it is very much easier to use, no need for the LabeledPoint transformation. Here is the code I came up with:

    val dt = new DecisionTreeRegressor()
      .setPredictionCol("Y")
      .setImpurity("variance")
      .setMaxDepth(30)
      .setMaxBins(32)

    val model = dt.fit(data)

    println(model.toDebugString)
    println(model.featureImportances.toString)

However, I cannot find a way to specify which columns are features, which ones are categorical and how many categories they have, like I used to do with the mllib package.
Well, further research led me to adding the following code to indicate which columns are categorical:

val X1Attribute = NominalAttribute.defaultAttr.withName("X1").withValues("0", "1").toMetadata val X2Attribute = NominalAttribute.defaultAttr.withName("X2").withValues("0", "1", "2").toMetadata

val dataWithAttributes = data.withColumn("X1", $"X1".as("X1", X1Attribute)).withColumn("X2", $"X2".as("X2", X2Attribute))

but when I run this:

  val model = dt.fit(dataWithAttributes )

I get the following error:
java.lang.IllegalArgumentException: Field "features" does not exist.

It makes sense because I am yet to find a way to specify which columns are features. I also have to figure out what the label column is and what differences it has from the prediction column as only the latter was used with the mllib package.


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to