Re: [Scikit-learn-general] Scalability of Gradient Boosting Classifier

2015-10-01 Thread Gael Varoquaux
On Thu, Oct 01, 2015 at 11:10:51AM +0200, Maryam Tavakol wrote: > My problem however is the size of data in terms of number of samples. > The features are engineered and are only 80. I wanted to try training > on bigger set of data for improvement. I would use the BIRCH clustering method in an onl

Re: [Scikit-learn-general] Scalability of Gradient Boosting Classifier

2015-10-01 Thread Maryam Tavakol
> Message: 3 > Date: Wed, 30 Sep 2015 12:11:00 -0700 > From: Jacob Schreiber > Subject: Re: [Scikit-learn-general] Scalability of Gradient Boosting > Classifier > To: scikit-learn-general@lists.sourceforge.net > Message-ID: > < > ca+ad8etyev331pfafp60jc

Re: [Scikit-learn-general] Scalability of Gradient Boosting Classifier

2015-09-30 Thread Jacob Schreiber
Hi Maryam Currently, no tree based methods have a partial fit method. We are currently working on expanding the tree module, you can see our checklist here; https://github.com/scikit-learn/scikit-learn/issues/5212 There are many methods to reduce the dimensionality of data, if you are using high

[Scikit-learn-general] Scalability of Gradient Boosting Classifier

2015-09-30 Thread Maryam Tavakol
Dear all, I am using Gradient Boosting Classifier from scikit-learn for a huge set of data. Unfortunately, the method loads the whole data into memory (around 45 GBs!). As it is not very easy to modify the code to stream data, is there any other way to make it scalable? Best Regards, Maryam Tavak