Re: How to deal with string column data for spark mlib?

big data Tue, 20 Dec 2016 06:54:43 -0800

I want to use decision tree to evaluate whether the event will be happened, the 
data like this:


userid     sex    country   age    attr1  attr2   ...   event

1           male     USA       23      xxx    xxxx  ....     0

2           male     UK       25      xxx    xxxx  ....     1

3           female   JPN       35      xxx    xxxx  ....     1

.......

I want to use sex, country, age, attr1, attr2, ... as input, and event column 
as the label column to be applied to decision tree.

In spark mlib, I get that all  columns value should be double to be calculated,

But I do not know to transfer sex, country, attr1, attr2 columns' value to 
double type directly in spark's job.


thanks.

在 16/12/20 下午9:37, theodondre 写道:
Give a snippets of the data.



Sent from my T-Mobile 4G LTE Device


-------- Original message --------
From: big data <bigdatab...@outlook.com><mailto:bigdatab...@outlook.com>
Date: 12/20/16 4:35 AM (GMT-05:00)
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: How to deal with string column data for spark mlib?

our source data are string-based data, like this:
col1   col2   col3 ...
aaa   bbb    ccc
aa2   bb2    cc2
aa3   bb3    cc3
...     ...       ...

How to convert all of these data to double to apply for mlib's algorithm?

thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>

Re: How to deal with string column data for spark mlib?

Reply via email to