You can get 1.6.0-RC1 from http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ currently, but it's not the last release version.
2015-12-02 23:57 GMT+08:00 Vishnu Viswanath <vishnu.viswanat...@gmail.com>: > Thank you Yanbo, > > It looks like this is available in 1.6 version only. > Can you tell me how/when can I download version 1.6? > > Thanks and Regards, > Vishnu Viswanath, > > On Wed, Dec 2, 2015 at 4:37 AM, Yanbo Liang <yblia...@gmail.com> wrote: > >> You can set "handleInvalid" to "skip" which help you skip the labels >> which not exist in training dataset. >> >> 2015-12-02 14:31 GMT+08:00 Vishnu Viswanath <vishnu.viswanat...@gmail.com >> >: >> >>> Hi Jeff, >>> >>> I went through the link you provided and I could understand how the >>> fit() and transform() work. >>> I tried to use the pipeline in my code and I am getting exception Caused >>> by: org.apache.spark.SparkException: Unseen label: >>> >>> The reason for this error as per my understanding is: >>> For the column on which I am doing StringIndexing, the test data is >>> having values which was not there in train data. >>> Since fit() is done only on the train data, the indexing is failing. >>> >>> Can you suggest me what can be done in this situation. >>> >>> Thanks, >>> >>> On Mon, Nov 30, 2015 at 12:32 AM, Vishnu Viswanath < >>> vishnu.viswanat...@gmail.com> wrote: >>> >>> Thank you Jeff. >>>> >>>> On Sun, Nov 29, 2015 at 7:36 PM, Jeff Zhang <zjf...@gmail.com> wrote: >>>> >>>>> StringIndexer is an estimator which would train a model to be used >>>>> both in training & prediction. So it is consistent between training & >>>>> prediction. >>>>> >>>>> You may want to read this section of spark ml doc >>>>> http://spark.apache.org/docs/latest/ml-guide.html#how-it-works >>>>> >>>>> >>>>> >>>>> On Mon, Nov 30, 2015 at 12:52 AM, Vishnu Viswanath < >>>>> vishnu.viswanat...@gmail.com> wrote: >>>>> >>>>>> Thanks for the reply Yanbo. >>>>>> >>>>>> I understand that the model will be trained using the indexer map >>>>>> created during the training stage. >>>>>> >>>>>> But since I am getting a new set of data during prediction, and I >>>>>> have to do StringIndexing on the new data also, >>>>>> Right now I am using a new StringIndexer for this purpose, or is >>>>>> there any way that I can reuse the Indexer used for training stage. >>>>>> >>>>>> Note: I am having a pipeline with StringIndexer in it, and I am >>>>>> fitting my train data in it and building the model. Then later when i get >>>>>> the new data for prediction, I am using the same pipeline to fit the data >>>>>> again and do the prediction. >>>>>> >>>>>> Thanks and Regards, >>>>>> Vishnu Viswanath >>>>>> >>>>>> >>>>>> On Sun, Nov 29, 2015 at 8:14 AM, Yanbo Liang <yblia...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Vishnu, >>>>>>> >>>>>>> The string and indexer map is generated at model training step and >>>>>>> used at model prediction step. >>>>>>> It means that the string and indexer map will not changed when >>>>>>> prediction. You will use the original trained model when you do >>>>>>> prediction. >>>>>>> >>>>>>> 2015-11-29 4:33 GMT+08:00 Vishnu Viswanath < >>>>>>> vishnu.viswanat...@gmail.com>: >>>>>>> > Hi All, >>>>>>> > >>>>>>> > I have a general question on using StringIndexer. >>>>>>> > StringIndexer gives an index to each label in the feature starting >>>>>>> from 0 ( >>>>>>> > 0 for least frequent word). >>>>>>> > >>>>>>> > Suppose I am building a model, and I use StringIndexer for >>>>>>> transforming on >>>>>>> > of my column. >>>>>>> > e.g., suppose A was most frequent word followed by B and C. >>>>>>> > >>>>>>> > So the StringIndexer will generate >>>>>>> > >>>>>>> > A 0.0 >>>>>>> > B 1.0 >>>>>>> > C 2.0 >>>>>>> > >>>>>>> > After building the model, I am going to do some prediction using >>>>>>> this model, >>>>>>> > So I do the same transformation on my new data which I need to >>>>>>> predict. And >>>>>>> > suppose the new dataset has C as the most frequent word, followed >>>>>>> by B and >>>>>>> > A. So the StringIndexer will assign index as >>>>>>> > >>>>>>> > C 0.0 >>>>>>> > B 1.0 >>>>>>> > A 2.0 >>>>>>> > >>>>>>> > These indexes are different from what we used for modeling. So >>>>>>> won’t this >>>>>>> > give me a wrong prediction if I use StringIndexer? >>>>>>> > >>>>>>> > >>>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>>> >>>> >>>> >>>> >>>> >>> >> >> >