Re: [Scikit-learn-general] Saving Huge Models

2014-02-26 Thread Andy
On 02/26/2014 05:55 PM, Peter Prettenhofer wrote: > > please make sure to pickle with the highest protocol - otherwise > pickle uses a textual serialization format which is quite inefficient: > > pickle.dump(clf, f, protocol=pickle.HIGHEST_PROTOCOL) Or simply protocol=-1. This usually makes a hu

Re: [Scikit-learn-general] Saving Huge Models

2014-02-26 Thread Olivier Grisel
You can control the size of your random forest by adjusting the parameters n_estimators, min_samples_split and even max_depth (read the documentation for more details). It's up to you to find parameter values that match your constraints in terms of accuracy vs model size in RAM and prediction spee

Re: [Scikit-learn-general] Saving Huge Models

2014-02-26 Thread Peter Prettenhofer
Hi Lorenzo, please make sure to pickle with the highest protocol - otherwise pickle uses a textual serialization format which is quite inefficient: pickle.dump(clf, f, protocol=pickle.HIGHEST_PROTOCOL) For large datasets limit the number of tree nodes by specifying ``min_samples_leaf`` -- sett