Ahh... of course. I should have understood that from the multiplication you did since 104 = 105-1.
On Tue, Jan 3, 2012 at 7:58 PM, Grant Ingersoll <gsing...@apache.org> wrote: > > On Jan 3, 2012, at 5:59 PM, Ted Dunning wrote: > > > You math is correct. > > > > When you say you have 105 features, what do you mean? > > Sorry, that should have been 105 categories/labels. I'm trying to do the > ASF email equivalent of 20 news groups, but in this case it's 105 ASF > projects. The basic task is to try and predict what project an email > belongs to based on its content. > > > Are these textual > > features? Or what? > > > > On Tue, Jan 3, 2012 at 2:53 PM, Grant Ingersoll <gsing...@apache.org> > wrote: > > > >> I'm trying to run the full ASF email SGD classifier problem and am > facing > >> heap size issues. My current setup has 105 features and I am using a > >> cardinality of 100K. I'm using the AdaptiveLogisticRegression. I'm > >> getting heap errors and they occur when trying to construct the ALR > class > >> (i.e. not later during training). > >> > >> Just trying to check my math on memory: > >> ALR comes with 20 CrossFoldLearners (CFL) and each of those comes with 5 > >> OnlineLogisticRegression instances, which each have a DenseMatrix of > >> (numFeatures -1) X cardinality, plus some other vectors. > >> > >> This means, in my case, I have: > >> 20 x 5 x (104 x 100,000 x sizeof(double)) = 332,800,000,000 bits = ~39 > GB > >> > >> Am I understanding the major parts of memory for ALR correctly? In > other > >> words, I need to tone down the number of CFLs in the TrainASFEmail.java > >> file so as to not use 20 CFLs, right? > > >