I want to use decision tree to evaluate whether the event will be happened, the data like this:
userid sex country age attr1 attr2 ... event 1 male USA 23 xxx xxxx .... 0 2 male UK 25 xxx xxxx .... 1 3 female JPN 35 xxx xxxx .... 1 ....... I want to use sex, country, age, attr1, attr2, ... as input, and event column as the label column to be applied to decision tree. In spark mlib, I get that all columns value should be double to be calculated, But I do not know to transfer sex, country, attr1, attr2 columns' value to double type directly in spark's job. thanks. 在 16/12/20 下午9:37, theodondre 写道: Give a snippets of the data. Sent from my T-Mobile 4G LTE Device -------- Original message -------- From: big data <bigdatab...@outlook.com><mailto:bigdatab...@outlook.com> Date: 12/20/16 4:35 AM (GMT-05:00) To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: How to deal with string column data for spark mlib? our source data are string-based data, like this: col1 col2 col3 ... aaa bbb ccc aa2 bb2 cc2 aa3 bb3 cc3 ... ... ... How to convert all of these data to double to apply for mlib's algorithm? thanks. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>