On Thu, Oct 01, 2015 at 11:10:51AM +0200, Maryam Tavakol wrote:
> My problem however is the size of data in terms of number of samples.
> The features are engineered and are only 80. I wanted to try training
> on bigger set of data for improvement.
I would use the BIRCH clustering method in an onl
> Message: 3
> Date: Wed, 30 Sep 2015 12:11:00 -0700
> From: Jacob Schreiber
> Subject: Re: [Scikit-learn-general] Scalability of Gradient Boosting
> Classifier
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID:
> <
> ca+ad8etyev331pfafp60jc
Hi Maryam
Currently, no tree based methods have a partial fit method. We are
currently working on expanding the tree module, you can see our checklist
here; https://github.com/scikit-learn/scikit-learn/issues/5212
There are many methods to reduce the dimensionality of data, if you are
using high
Dear all,
I am using Gradient Boosting Classifier from scikit-learn for a huge set of
data. Unfortunately, the method loads the whole data into memory (around 45
GBs!). As it is not very easy to modify the code to stream data, is there
any other way to make it scalable?
Best Regards,
Maryam Tavak