HI all
 i am trying to run a sample decision tree, following examples here (for
Mllib)

https://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-classifier

the example seems to use  a Vectorindexer, however i am missing something.
How does the featureIndexer knows which columns are features?
Isnt' there something missing?  or the featuresIndexer is able to figure
out by itself
which columns of teh DAtaFrame are features?

val labelIndexer = new StringIndexer()
  .setInputCol("label")
  .setOutputCol("indexedLabel")
  .fit(data)// Automatically identify categorical features, and index
them.val featureIndexer = new VectorIndexer()
  .setInputCol("features")
  .setOutputCol("indexedFeatures")
  .setMaxCategories(4) // features with > 4 distinct values are
treated as continuous.
  .fit(data)

Using this code i am getting back this exception

Exception in thread "main" java.lang.IllegalArgumentException: Field
"features" does not exist.
        at 
org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:266)
        at 
org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:266)
        at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
        at scala.collection.AbstractMap.getOrElse(Map.scala:59)
        at org.apache.spark.sql.types.StructType.apply(StructType.scala:265)
        at 
org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:40)
        at 
org.apache.spark.ml.feature.VectorIndexer.transformSchema(VectorIndexer.scala:141)
        at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
        at 
org.apache.spark.ml.feature.VectorIndexer.fit(VectorIndexer.scala:118)

what am i missing?

w/kindest regarsd

 marco

Reply via email to