Re: How to deal with string column data for spark mlib?

2016-12-20 Thread big data
I want to use decision tree to evaluate whether the event will be happened, the data like this: userid sexcountry ageattr1 attr2 ... event 1 male USA 23 xxx 0 2 male UK 25 xxx 1 3

RE: How to deal with string column data for spark mlib?

2016-12-20 Thread theodondre
Give a snippets of the data. Sent from my T-Mobile 4G LTE Device Original message From: big data Date: 12/20/16 4:35 AM (GMT-05:00) To: user@spark.apache.org Subject: How to deal with string column data for spark mlib? our source data are

撤回: How to deal with string column data for spark mlib?

2016-12-20 Thread Triones,Deng(vip.com)
邓刚[技术中心] 将撤回邮件“How to deal with string column data for spark mlib?”。 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are

答复: How to deal with string column data for spark mlib?

2016-12-20 Thread Triones,Deng(vip.com)
Hi spark dev, I am using spark 2 to write orc file to hdfs. I have one questions about the savemode. My use case is this. When I write data into hdfs. If one task failed I hope the file that the task created should be delete and the retry task can write all data, that is to

Re: How to deal with string column data for spark mlib?

2016-12-20 Thread Rohit Verma
@Deepak, This conversion is not suitable for categorical data. But again as I mentioned its all dependent on nature of data and what is intended by OP Consider you want to convert race into numbers (races as black, white and asian) So, you want numerical variables, and you could just assign a

Re: How to deal with string column data for spark mlib?

2016-12-20 Thread Deepak Sharma
You can read the source in a data frame. Then iterate over all rows with map and use something like below: df.map(x=>x(0).toString().toDouble) Thanks Deepak On Tue, Dec 20, 2016 at 3:05 PM, big data wrote: > our source data are string-based data, like this: > col1

Re: How to deal with string column data for spark mlib?

2016-12-20 Thread Rohit Verma
There are various techniques but the actual answer will depend on what you are trying to do, kind of input data, nature of algorithm. You can browse through https://www.analyticsvidhya.com/blog/2015/11/easy-methods-deal-categorical-variables-predictive-modeling/ this should give you a starting

How to deal with string column data for spark mlib?

2016-12-20 Thread big data
our source data are string-based data, like this: col1 col2 col3 ... aaa bbbccc aa2 bb2cc2 aa3 bb3cc3 ... ... ... How to convert all of these data to double to apply for mlib's algorithm? thanks.