On 02/26/2014 05:55 PM, Peter Prettenhofer wrote:
>
> please make sure to pickle with the highest protocol - otherwise
> pickle uses a textual serialization format which is quite inefficient:
>
> pickle.dump(clf, f, protocol=pickle.HIGHEST_PROTOCOL)
Or simply protocol=-1. This usually makes a hu
You can control the size of your random forest by adjusting the
parameters n_estimators, min_samples_split and even max_depth (read
the documentation for more details).
It's up to you to find parameter values that match your constraints in
terms of accuracy vs model size in RAM and prediction spee
Hi Lorenzo,
please make sure to pickle with the highest protocol - otherwise pickle
uses a textual serialization format which is quite inefficient:
pickle.dump(clf, f, protocol=pickle.HIGHEST_PROTOCOL)
For large datasets limit the number of tree nodes by specifying
``min_samples_leaf`` -- sett
Dear All,
I am using RandomForest on a data set which has less than 20 features, but
about 40 lines.
The point is that, even if I work on a subset of about 3 lines to
train my model, when I save it using pickle, I get a large file in the
order of several hundreds of Mb of space (see