I think you should submit these changes as a pull request. Thanks, Jared.
On 8 April 2016 at 21:17, Jared Gabor wrote:
> I recently modified the kernel density estimation routines in
> sklearn/neighbors to include optional weighting of the training samples (to
> make analogs to weighted histogra
Thanks for comments! I put more details of my problem here
http://stackoverflow.com/questions/36523989/why-sklearn-randomforest-model-take-a-lot-of-disk-space-after-save
Indeed, saving with joblib takes less space but there is still a lot of
space used on the disk.
Best,
Piotr
2016-04-10 15:24
You may also want to save your model using joblib (possibly with
compression enabled) instead of cPickle.
Mathieu
On Sun, Apr 10, 2016 at 9:13 AM, Piotr Płoński wrote:
> Hi All,
>
> I am saving RandomForestClassifier model from sklearn library with code
> below
>
> with open('/tmp/rf.model', 'w
And also in LinearSVC with dual=True. The only difference is that the
choice of dual variable is cyclic (with prior permutation) instead of
random.
See this 2008 paper:
http://www.csie.ntu.edu.tw/~cjlin/papers/cddual.pdf
Mathieu
On Sun, Apr 10, 2016 at 9:53 PM, Alexandre Gramfort <
alexandre.gra
hi,
sdca is used internally in liblinear so offered by our logreg estimator.
otherwise it's implemented in lightning :
https://github.com/scikit-learn-contrib/lightning
A
On Sat, Apr 9, 2016 at 8:14 PM, Ahmed SaadAliden
wrote:
> Hi,
>
> I am thinking about adding SDCA "Stochastic dual coordin
If you're running a random forest with default parameters (max_depth=None,
min_samples_split=2,
min_samples_leaf=1, min_weight_fraction_leaf=0, max_leaf_nodes=None), the
size of the tree will tend towards the size of the dataset. Change some of
these parameters to reduce overfitting and model size.