Re: [Scikit-learn-general] Problems with Python's garbage collection in GridSearch

2014-12-16 Thread Sebastian Raschka
I really have no idea, but thanks for the suggestion. Unfortunately, gc.isenabled() evaluates to True. > On Dec 16, 2014, at 2:05 PM, Lars Buitinck wrote: > > 2014-12-16 17:57 GMT+01:00 Sebastian Raschka : >> Maybe it's something in scipy according to Manoj's linked discussion ... in >> any c

Re: [Scikit-learn-general] Problems with Python's garbage collection in GridSearch

2014-12-16 Thread Lars Buitinck
2014-12-16 17:57 GMT+01:00 Sebastian Raschka : > Maybe it's something in scipy according to Manoj's linked discussion ... in > any case, maybe a workaround for this issue and future issues would be to > have a > "forxe_clear_gc" (default=False) parameter to force the garbage collector to > be em

Re: [Scikit-learn-general] Problems with Python's garbage collection in GridSearch

2014-12-16 Thread Gael Varoquaux
On Tue, Dec 16, 2014 at 11:57:47AM -0500, Sebastian Raschka wrote: > Maybe it's something in scipy according to Manoj's linked discussion ... in > any case, maybe a workaround for this issue and future issues would be to > have a > "forxe_clear_gc" (default=False) parameter to force the garbage

Re: [Scikit-learn-general] Problems with Python's garbage collection in GridSearch

2014-12-16 Thread Sebastian Raschka
Hi, Andy, the models that I am using are Random Forests and naive Bayes classifiers. Maybe it's something in scipy according to Manoj's linked discussion ... in any case, maybe a workaround for this issue and future issues would be to have a "forxe_clear_gc" (default=False) parameter to force t

Re: [Scikit-learn-general] Problems with Python's garbage collection in GridSearch

2014-12-16 Thread Andy
Hi. Which models are you using and which version of scikit-learn? Cheers, Andy On 12/16/2014 11:19 AM, Sebastian Raschka wrote: > Hi all, > > I am wondering if someone noticed that GridSearch is eating more and more > memory over time? I read related discussion on the issue list on GitHub and

Re: [Scikit-learn-general] Problems with Python's garbage collection in GridSearch

2014-12-16 Thread Manoj Kumar
Hi, Is there are a slight possibility that this is related to this issue? ( https://github.com/scikit-learn/scikit-learn/issues/3762) . This was an issue with Scikit-learn solvers that are wrappers around SciPy's solvers, (eg, sparse-cg). They are unable to garbage-collect memory due to reference

[Scikit-learn-general] Problems with Python's garbage collection in GridSearch

2014-12-16 Thread Sebastian Raschka
Hi all, I am wondering if someone noticed that GridSearch is eating more and more memory over time? I read related discussion on the issue list on GitHub and it sounds like that it has been solved (estimators are not kept anymore, and the best estimator can optionally be refitted at the end of

Re: [Scikit-learn-general] [JOB] Software Engineer positions to work on scikit-learn at Inria

2014-12-16 Thread Olivier Grisel
And I forgot to include the link to the online ad with info on the team members, location and such: https://team.inria.fr/parietal/job-offers/ -- Olivier -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Ser

Re: [Scikit-learn-general] [JOB] Software Engineer positions to work on scikit-learn at Inria

2014-12-16 Thread Olivier Grisel
Sorry, apparently the email addresses in CC got cut by the mailing list software. For both of us firstname.lastn...@inria.fr will do the work. -- Olivier -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Ser

[Scikit-learn-general] [JOB] Software Engineer positions to work on scikit-learn at Inria

2014-12-16 Thread Olivier Grisel
Hi all, We have a few internship and fixed-term employment contract opportunities for software engineer positions to work on the development of scikit-learn and related open source projects. The position involves contributing to scikit-learn in collaboration with the project community via the usu

Re: [Scikit-learn-general] Samples per estimator on Random Forests

2014-12-16 Thread Gilles Louppe
Hi Miquel, These options are not available within RandomForestClassifier/Regressor. By default len(X) are drawn with replacement. However, you can achieve what you look for using BaggingClassifier(base_estimator=DecisionTreeClassifier(...), max_samples=..., max_features=...), where max_samples an

Re: [Scikit-learn-general] Samples per estimator on Random Forests

2014-12-16 Thread Arnaud Joly
Hi, 1) It’s not possible to perform subsampling in RandomForestRegressor and RandomForestClassifier. However, you can use the BaggingClassifier and the BaggingRegressor to achieve that. 2) The number of features drawn at each during the tree growth is control by the max_features parameters. If

[Scikit-learn-general] Samples per estimator on Random Forests

2014-12-16 Thread Miquel Camprodon
Hi all, I am a newbie in the use of scikit-learn. Congratulations for the project! (1) I am using the ensemble methods RandomForestRegressor and RandomForestClassifier. I would like to adjust the number of subsamples each estimator will use. How can I achieve this? My background is in R, and in t

Re: [Scikit-learn-general] onehotencoder and data load

2014-12-16 Thread Daniel Sullivan
Also, I meant read_csv: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html not load_csv On Tue, Dec 16, 2014 at 9:55 AM, Daniel Sullivan wrote: > > Hi Roberto, > > One thing you might try to get an integer instead of one-hot encoded > values is a LabelEncoder:

Re: [Scikit-learn-general] onehotencoder and data load

2014-12-16 Thread Daniel Sullivan
Hi Roberto, One thing you might try to get an integer instead of one-hot encoded values is a LabelEncoder: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html It's really useful if you process the complete dataset in memory. If you can't hold your complete da