Re: [Scikit-learn-general] weighted kernel density estimation

2016-04-10 Thread Joel Nothman
I think you should submit these changes as a pull request. Thanks, Jared. On 8 April 2016 at 21:17, Jared Gabor wrote: > I recently modified the kernel density estimation routines in > sklearn/neighbors to include optional weighting of the training samples (to > make analogs to weighted histogra

Re: [Scikit-learn-general] [scikit-learn-general] Why sklearn RandomForest model take a lot of disk space after save?

2016-04-10 Thread Piotr Płoński
Thanks for comments! I put more details of my problem here http://stackoverflow.com/questions/36523989/why-sklearn-randomforest-model-take-a-lot-of-disk-space-after-save Indeed, saving with joblib takes less space but there is still a lot of space used on the disk. Best, Piotr 2016-04-10 15:24

Re: [Scikit-learn-general] [scikit-learn-general] Why sklearn RandomForest model take a lot of disk space after save?

2016-04-10 Thread Mathieu Blondel
You may also want to save your model using joblib (possibly with compression enabled) instead of cPickle. Mathieu On Sun, Apr 10, 2016 at 9:13 AM, Piotr Płoński wrote: > Hi All, > > I am saving RandomForestClassifier model from sklearn library with code > below > > with open('/tmp/rf.model', 'w

Re: [Scikit-learn-general] Stochastic dual coordinate aescent solver for linear models

2016-04-10 Thread Mathieu Blondel
And also in LinearSVC with dual=True. The only difference is that the choice of dual variable is cyclic (with prior permutation) instead of random. See this 2008 paper: http://www.csie.ntu.edu.tw/~cjlin/papers/cddual.pdf Mathieu On Sun, Apr 10, 2016 at 9:53 PM, Alexandre Gramfort < alexandre.gra

Re: [Scikit-learn-general] Stochastic dual coordinate aescent solver for linear models

2016-04-10 Thread Alexandre Gramfort
hi, sdca is used internally in liblinear so offered by our logreg estimator. otherwise it's implemented in lightning : https://github.com/scikit-learn-contrib/lightning A On Sat, Apr 9, 2016 at 8:14 PM, Ahmed SaadAliden wrote: > Hi, > > I am thinking about adding SDCA "Stochastic dual coordin

Re: [Scikit-learn-general] [scikit-learn-general] Why sklearn RandomForest model take a lot of disk space after save?

2016-04-10 Thread Joel Nothman
If you're running a random forest with default parameters (max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0, max_leaf_nodes=None), the size of the tree will tend towards the size of the dataset. Change some of these parameters to reduce overfitting and model size.