Hi All, I have a general question on using StringIndexer. StringIndexer gives an index to each label in the feature starting from 0 ( 0 for least frequent word).
Suppose I am building a model, and I use StringIndexer for transforming on of my column. e.g., suppose A was most frequent word followed by B and C. So the StringIndexer will generate A 0.0 B 1.0 C 2.0 After building the model, I am going to do some prediction using this model, So I do the same transformation on my new data which I need to predict. And suppose the new dataset has C as the most frequent word, followed by B and A. So the StringIndexer will assign index as C 0.0 B 1.0 A 2.0 These indexes are different from what we used for modeling. So won’t this give me a wrong prediction if I use StringIndexer? -- Thanks and Regards, Vishnu Viswanath, *www.vishnuviswanath.com <http://www.vishnuviswanath.com>*