Hi Abhi,

In SparkR glm, category features (columns of type string) will be one-hot
encoded automatically.
So pre-processing like `as.factor` is not necessary, you can directly feed
your data to the model training.

Thanks
Yanbo

2016-05-30 2:06 GMT-07:00 Abhishek Anand <abhis.anan...@gmail.com>:

> Hi ,
>
> I want to run glm variant of sparkR for my data that is there in a csv
> file.
>
> I see that the glm function in sparkR takes a spark dataframe as input.
>
> Now, when I read a file from csv and create a spark dataframe, how could I
> take care of the factor variables/columns in my data ?
>
> Do I need to convert it to a R dataframe, convert to factor using
> as.factor and create spark dataframe and run glm over it ?
>
> But, running as.factor over big dataset is not possible.
>
> Please suggest what is the best way to acheive this ?
>
> What pre-processing should be done, and what is the best way to achieve it
>  ?
>
>
> Thanks,
> Abhi
>

Reply via email to