Re: QuantileDiscretizer not working properly with big dataframes

2016-07-16 Thread Yanbo Liang
Could you tell us the Spark version you used? We have fixed this bug at Spark 1.6.2 and Spark 2.0, please upgrade to these versions and retry. If this issue still exists, please let us know. Thanks Yanbo 2016-07-12 11:03 GMT-07:00 Pasquinell Urbani < pasquinell.urb...@exalitica.com>: > In the

Re: QuantileDiscretizer not working properly with big dataframes

2016-07-12 Thread Pasquinell Urbani
In the forum mentioned above the flowing solution is suggested Problem is in line 113 and 114 of QuantileDiscretizer.scala and can be fixed by changing line 113 like so: before: val requiredSamples = math.max(numBins * numBins, 1) after: val requiredSamples = math.max(numBins * numBins,

QuantileDiscretizer not working properly with big dataframes

2016-07-11 Thread Pasquinell Urbani
Hi all, We have a dataframe with 2.5 millions of records and 13 features. We want to perform a logistic regression with this data but first we neet to divide each columns in discrete values using QuantileDiscretizer. This will improve the performance of the model by avoiding outliers. For small