It's very straightforward, please refer the document here http://spark.apache.org/docs/latest/ml-features.html#bucketizer
On Mon, Jan 25, 2016 at 10:09 PM, Eli Super <eli.su...@gmail.com> wrote: > Thanks Joshua , > > I can't understand what algorithm behind Bucketizer , how discretization > done ? > > Best Regards > > > On Mon, Jan 25, 2016 at 3:40 PM, Joshua TAYLOR <joshuaaa...@gmail.com> > wrote: > >> It sounds like you may want the Bucketizer in SparkML. The overview docs >> [1] include, "Bucketizer transforms a column of continuous features to a >> column of feature buckets, where the buckets are specified by users." >> >> [1]: http://spark.apache.org/docs/latest/ml-features.html#bucketizer >> >> On Mon, Jan 25, 2016 at 5:34 AM, Eli Super <eli.su...@gmail.com> wrote: >> >>> Hi >>> >>> What is a best way to discretize Continuous Variable within Spark >>> DataFrames ? >>> >>> I want to discretize some variable 1) by equal frequency 2) by k-means >>> >>> I usually use R for this porpoises >>> >>> _http://www.inside-r.org/packages/cran/arules/docs/discretize >>> >>> R code for example : >>> >>> ### equal frequency >>> table(discretize(data$some_column, "frequency", categories=10)) >>> >>> >>> #k-means >>> table(discretize(data$some_column, "cluster", categories=10)) >>> >>> Thanks a lot ! >>> >> >> >> >> -- >> Joshua Taylor, http://www.cs.rpi.edu/~tayloj/ >> > > -- Best Regards Jeff Zhang