OBones wrote:
So, I tried to rewrite my sample code using the ml package and it is
very much easier to use, no need for the LabeledPoint transformation.
Here is the code I came up with:
val dt = new DecisionTreeRegressor()
.setPredictionCol("Y")
.setImpurity("variance")
.setMaxDepth(30)
.setMaxBins(32)
val model = dt.fit(data)
println(model.toDebugString)
println(model.featureImportances.toString)
However, I cannot find a way to specify which columns are features,
which ones are categorical and how many categories they have, like I
used to do with the mllib package.
Well, further research led me to adding the following code to indicate
which columns are categorical:
val X1Attribute =
NominalAttribute.defaultAttr.withName("X1").withValues("0", "1").toMetadata
val X2Attribute =
NominalAttribute.defaultAttr.withName("X2").withValues("0", "1",
"2").toMetadata
val dataWithAttributes = data.withColumn("X1", $"X1".as("X1",
X1Attribute)).withColumn("X2", $"X2".as("X2", X2Attribute))
but when I run this:
val model = dt.fit(dataWithAttributes )
I get the following error:
java.lang.IllegalArgumentException: Field "features" does not exist.
It makes sense because I am yet to find a way to specify which columns
are features.
I also have to figure out what the label column is and what differences
it has from the prediction column as only the latter was used with the
mllib package.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org