You math is correct. When you say you have 105 features, what do you mean? Are these textual features? Or what?
On Tue, Jan 3, 2012 at 2:53 PM, Grant Ingersoll <gsing...@apache.org> wrote: > I'm trying to run the full ASF email SGD classifier problem and am facing > heap size issues. My current setup has 105 features and I am using a > cardinality of 100K. I'm using the AdaptiveLogisticRegression. I'm > getting heap errors and they occur when trying to construct the ALR class > (i.e. not later during training). > > Just trying to check my math on memory: > ALR comes with 20 CrossFoldLearners (CFL) and each of those comes with 5 > OnlineLogisticRegression instances, which each have a DenseMatrix of > (numFeatures -1) X cardinality, plus some other vectors. > > This means, in my case, I have: > 20 x 5 x (104 x 100,000 x sizeof(double)) = 332,800,000,000 bits = ~39 GB > > Am I understanding the major parts of memory for ALR correctly? In other > words, I need to tone down the number of CFLs in the TrainASFEmail.java > file so as to not use 20 CFLs, right?