Re: [scikit-learn] caching transformers during hyper parameter optimization

2017-08-16 Thread Joel Nothman
Now this isn't the best example, because joblib.Memory isn't going to be very fast at dumping a list of strings, but I hope you can get the idea from https://gist.github.com/jnothman/019d594d197c98a3d6192fa0cb19c850 On 17 August 2017 at 02:53, Georg Heiler wrote: > Data cleaning @ enrichment >

Re: [scikit-learn] caching transformers during hyper parameter optimization

2017-08-16 Thread Georg Heiler
Data cleaning @ enrichment Could you link an example for a mixing? Currently this is a bit if a mess with custom pickle persistence in a big for loop and custom transformers Thanks. Georg Joel Nothman schrieb am Mi. 16. Aug. 2017 um 13:51: > We certainly considered this over the many years tha

Re: [scikit-learn] caching transformers during hyper parameter optimization

2017-08-16 Thread Joel Nothman
We certainly considered this over the many years that Pipeline caching has been in the pipeline. Storing the fitted model means we can do both a fit_transform and a transform on new data, and in many cases takes away the pain point of CV over pipelines where downstream steps are varied. What trans

[scikit-learn] caching transformers during hyper parameter optimization

2017-08-16 Thread Georg Heiler
There is a new option in the pipeline: http://scikit-learn.org/stable/modules/pipeline.html#pipeline-cache How can I use this to also store the transformed data as I only want to compute the last step i.e. estimator during hyper parameter tuning and not the transform methods of the clean steps? Is