Hi, Marco,

val data =
spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")

The data now include a feature column with name "features",

val featureIndexer = new VectorIndexer()
  .setInputCol("features")   <------ Here specify the "features"
column to index.
  .setOutputCol("indexedFeatures")


Thanks.


On Sat, Dec 16, 2017 at 6:26 AM, Marco Mistroni <mmistr...@gmail.com> wrote:

> HI all
>  i am trying to run a sample decision tree, following examples here (for
> Mllib)
>
> https://spark.apache.org/docs/latest/ml-classification-
> regression.html#decision-tree-classifier
>
> the example seems to use  a Vectorindexer, however i am missing something.
> How does the featureIndexer knows which columns are features?
> Isnt' there something missing?  or the featuresIndexer is able to figure
> out by itself
> which columns of teh DAtaFrame are features?
>
> val labelIndexer = new StringIndexer()
>   .setInputCol("label")
>   .setOutputCol("indexedLabel")
>   .fit(data)// Automatically identify categorical features, and index 
> them.val featureIndexer = new VectorIndexer()
>   .setInputCol("features")
>   .setOutputCol("indexedFeatures")
>   .setMaxCategories(4) // features with > 4 distinct values are treated as 
> continuous.
>   .fit(data)
>
> Using this code i am getting back this exception
>
> Exception in thread "main" java.lang.IllegalArgumentException: Field 
> "features" does not exist.
>         at 
> org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:266)
>         at 
> org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:266)
>         at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>         at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>         at org.apache.spark.sql.types.StructType.apply(StructType.scala:265)
>         at 
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:40)
>         at 
> org.apache.spark.ml.feature.VectorIndexer.transformSchema(VectorIndexer.scala:141)
>         at 
> org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
>         at 
> org.apache.spark.ml.feature.VectorIndexer.fit(VectorIndexer.scala:118)
>
> what am i missing?
>
> w/kindest regarsd
>
>  marco
>
>

Reply via email to