HI All,

I am trying to use the VectorIndexer (FeatureExtraction) technique
available from the Spark ML Pipelines.

I ran the example in the documentation .

val featureIndexer = new VectorIndexer()
  .setInputCol("features")
  .setOutputCol("indexedFeatures")
  .setMaxCategories(4)
  .fit(data)


And then I wanted to see what output it generates.

After performing transform on the data set , the output looks like below.

scala> predictions.select("indexedFeatures").take(1).foreach(println)

[(692,[124,125,126,127,151,152,153,154,155,179,180,181,182,183,208,209,210,211,235,236,237,238,239,263,264,265,266,267,268,292,293,294,295,296,321,322,323,324,349,350,351,352,377,378,379,380,405,406,407,408,433,434,435,436,461,462,463,464,489,490,491,492,493,517,518,519,520,521,545,546,547,548,549,574,575,576,577,578,602,603,604,605,606,630,631,632,633,634,658,659,660,661,662],[145.0,255.0,211.0,31.0,32.0,237.0,253.0,252.0,71.0,11.0,175.0,253.0,252.0,71.0,144.0,253.0,252.0,71.0,16.0,191.0,253.0,252.0,71.0,26.0,221.0,253.0,252.0,124.0,31.0,125.0,253.0,252.0,252.0,108.0,253.0,252.0,252.0,108.0,255.0,253.0,253.0,108.0,253.0,252.0,252.0,108.0,253.0,252.0,252.0,108.0,253.0,252.0,252.0,108.0,255.0,253.0,253.0,170.0,253.0,252.0,252.0,252.0,42.0,149.0,252.0,252.0,252.0,144.0,109.0,252.0,252.0,252.0,144.0,218.0,253.0,253.0,255.0,35.0,175.0,252.0,252.0,253.0,35.0,73.0,252.0,252.0,253.0,35.0,31.0,211.0,252.0,253.0,35.0])]


scala> predictions.select("features").take(1).foreach(println)

[(692,[124,125,126,127,151,152,153,154,155,179,180,181,182,183,208,209,210,211,235,236,237,238,239,263,264,265,266,267,268,292,293,294,295,296,321,322,323,324,349,350,351,352,377,378,379,380,405,406,407,408,433,434,435,436,461,462,463,464,489,490,491,492,493,517,518,519,520,521,545,546,547,548,549,574,575,576,577,578,602,603,604,605,606,630,631,632,633,634,658,659,660,661,662],[145.0,255.0,211.0,31.0,32.0,237.0,253.0,252.0,71.0,11.0,175.0,253.0,252.0,71.0,144.0,253.0,252.0,71.0,16.0,191.0,253.0,252.0,71.0,26.0,221.0,253.0,252.0,124.0,31.0,125.0,253.0,252.0,252.0,108.0,253.0,252.0,252.0,108.0,255.0,253.0,253.0,108.0,253.0,252.0,252.0,108.0,253.0,252.0,252.0,108.0,253.0,252.0,252.0,108.0,255.0,253.0,253.0,170.0,253.0,252.0,252.0,252.0,42.0,149.0,252.0,252.0,252.0,144.0,109.0,252.0,252.0,252.0,144.0,218.0,253.0,253.0,255.0,35.0,175.0,252.0,252.0,253.0,35.0,73.0,252.0,252.0,253.0,35.0,31.0,211.0,252.0,253.0,35.0])]

I can,t understand what is happening. I tried with simple data sets also ,
but similar result.

Please help.

Thanks,

Vishnu

Reply via email to