Re: How to discretize Continuous Variable with Spark DataFrames

Jeff Zhang Mon, 25 Jan 2016 06:13:58 -0800

It's very straightforward, please refer the document here
http://spark.apache.org/docs/latest/ml-features.html#bucketizer



On Mon, Jan 25, 2016 at 10:09 PM, Eli Super <eli.su...@gmail.com> wrote:

> Thanks Joshua ,
>
> I can't understand what algorithm behind Bucketizer  , how discretization
> done ?
>
> Best Regards
>
>
> On Mon, Jan 25, 2016 at 3:40 PM, Joshua TAYLOR <joshuaaa...@gmail.com>
> wrote:
>
>> It sounds like you may want the Bucketizer in SparkML.  The overview docs
>> [1] include, "Bucketizer transforms a column of continuous features to a
>> column of feature buckets, where the buckets are specified by users."
>>
>> [1]: http://spark.apache.org/docs/latest/ml-features.html#bucketizer
>>
>> On Mon, Jan 25, 2016 at 5:34 AM, Eli Super <eli.su...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> What is a best way to discretize Continuous Variable within  Spark
>>> DataFrames ?
>>>
>>> I want to discretize some variable 1) by equal frequency 2) by k-means
>>>
>>> I usually use R  for this porpoises
>>>
>>> _http://www.inside-r.org/packages/cran/arules/docs/discretize
>>>
>>> R code for example :
>>>
>>> ### equal frequency
>>> table(discretize(data$some_column, "frequency", categories=10))
>>>
>>>
>>> #k-means
>>> table(discretize(data$some_column, "cluster", categories=10))
>>>
>>> Thanks a lot !
>>>
>>
>>
>>
>> --
>> Joshua Taylor, http://www.cs.rpi.edu/~tayloj/
>>
>
>


-- 
Best Regards

Jeff Zhang

Re: How to discretize Continuous Variable with Spark DataFrames

Reply via email to