I'm trying to run the full ASF email SGD classifier problem and am facing heap size issues. My current setup has 105 features and I am using a cardinality of 100K. I'm using the AdaptiveLogisticRegression. I'm getting heap errors and they occur when trying to construct the ALR class (i.e. not later during training).
Just trying to check my math on memory: ALR comes with 20 CrossFoldLearners (CFL) and each of those comes with 5 OnlineLogisticRegression instances, which each have a DenseMatrix of (numFeatures -1) X cardinality, plus some other vectors. This means, in my case, I have: 20 x 5 x (104 x 100,000 x sizeof(double)) = 332,800,000,000 bits = ~39 GB Am I understanding the major parts of memory for ALR correctly? In other words, I need to tone down the number of CFLs in the TrainASFEmail.java file so as to not use 20 CFLs, right?