Okay this makes sense. Upon reflection, the splitters all only take
pointers to the datasets, so that shouldn't have been a problem.
On Thu, Oct 8, 2015 at 5:40 PM, Peter Rickwood
wrote:
>
> Found the issue
>
> It is because I am using warm start. I was using warm start and gradually
> adding mo
Found the issue
It is because I am using warm start. I was using warm start and gradually
adding models to the GBM, and this causes a memory blowout. If I change
this and just run the same number of iterations in one go rather than
incrementally, I get no memory issue.
Peter
-
Jacob: Great, thanks for confirming. Glad I'm not going crazy or doing
something silly.
What sklearn version would I need to downgrade to to get back to the old
setup (one splitter for all trees)?
Andreas: yes, it completes just fine if I set the number of iterations low
enough (i.e. ~80)
Thanks
On 10/08/2015 06:25 PM, Jacob Schreiber wrote:
> Hi
>
> I think your hypothesis is correct. We recently switched from having
> one splitter for all trees, to having one splitter per tree. I can
> submit a hotfix tonight to prevent the data from being held multiple times
>
Hm haven't paid attent
What I meant was if you set the number of max iterations to like 80,
does it run through?
On 10/08/2015 06:29 PM, Peter Rickwood wrote:
Yes, I can get up to 80-100 trees/iterations and everything works
normally (but slow due to thrashing) before the OS kills it.
I'll try and look into it w
Yes, I can get up to 80-100 trees/iterations and everything works normally
(but slow due to thrashing) before the OS kills it.
I'll try and look into it with the profiler you suggest and if I find
anything will get back to the list.
It is of course possible I'm doing something else on the side wh
Hi
I think your hypothesis is correct. We recently switched from having one
splitter for all trees, to having one splitter per tree. I can submit a
hotfix tonight to prevent the data from being held multiple times
Jacob
On Thu, Oct 8, 2015 at 3:16 PM, Andreas Mueller wrote:
> Hm, that does sou
Yes, I can get up to 80-100 trees/iterations and everything works normally
(but slow due to thrashing) before the OS kills it.
I'll try and look into it with the profiler you suggest and if I find
anything will get back to the list.
It is of course possible I'm doing something else on the side wh
Hm, that does sound a bit odd.
Maybe the memory_profiler will shed light on it?
https://pypi.python.org/pypi/memory_profiler
So if you use less than 100 trees it runs through?
Andy
On 10/08/2015 06:12 PM, Peter Rickwood wrote:
Hello all,
I'm puzzled by the memory use of sklearns GBM implem
Hello all,
I'm puzzled by the memory use of sklearns GBM implementation. It takes up
all available memory and is forced to terminate by the OS, and I cant think
of why it is using as much memory as it does.
Here is the siituation:
I have modest data set of size ~ 4GB (1800 columns, 55 rows,
On 10/07/2015 03:29 AM, Joel Nothman wrote:
> RFECV will select features based on scores on a number of validation
> sets, as selected by its cv parameter. As opposed to that
> StackOverflow query, RFECV should now support RandomForest and its
> feature_importances_ attribute.
>
RFECV is not t
11 matches
Mail list logo