Re: Running glm in sparkR (data pre-processing step)

2016-05-30 Thread Yanbo Liang
Yes, you are right. 2016-05-30 2:34 GMT-07:00 Abhishek Anand : > > Thanks Yanbo. > > So, you mean that if I have a variable which is of type double but I want > to treat it like String in my model I just have to cast those columns into > string and simply run the glm model. String columns will be

Re: Running glm in sparkR (data pre-processing step)

2016-05-30 Thread Abhishek Anand
Thanks Yanbo. So, you mean that if I have a variable which is of type double but I want to treat it like String in my model I just have to cast those columns into string and simply run the glm model. String columns will be directly one-hot encoded by the glm provided by sparkR ? Just wanted to cl

Re: Running glm in sparkR (data pre-processing step)

2016-05-30 Thread Yanbo Liang
Hi Abhi, In SparkR glm, category features (columns of type string) will be one-hot encoded automatically. So pre-processing like `as.factor` is not necessary, you can directly feed your data to the model training. Thanks Yanbo 2016-05-30 2:06 GMT-07:00 Abhishek Anand : > Hi , > > I want to run

Running glm in sparkR (data pre-processing step)

2016-05-30 Thread Abhishek Anand
Hi , I want to run glm variant of sparkR for my data that is there in a csv file. I see that the glm function in sparkR takes a spark dataframe as input. Now, when I read a file from csv and create a spark dataframe, how could I take care of the factor variables/columns in my data ? Do I need t