I'm trying to run the full ASF email SGD classifier problem and am facing heap 
size issues.  My current setup has 105 features and I am using a cardinality of 
100K.  I'm using the AdaptiveLogisticRegression.  I'm getting heap errors and 
they occur when trying to construct the ALR class (i.e. not later during 
training).

Just trying to check my math on memory:
ALR comes with 20 CrossFoldLearners (CFL) and each of those comes with 5 
OnlineLogisticRegression instances, which each have a DenseMatrix of 
(numFeatures -1) X cardinality, plus some other vectors.

This means, in my case, I have:
20 x 5 x (104 x 100,000 x sizeof(double)) = 332,800,000,000 bits = ~39 GB

Am I understanding the major parts of memory for ALR correctly?  In other 
words, I need to tone down the number of CFLs in the TrainASFEmail.java file so 
as to not use 20 CFLs, right?

Reply via email to