Ahh... of course.  I should have understood that from the multiplication
you did since 104 = 105-1.

On Tue, Jan 3, 2012 at 7:58 PM, Grant Ingersoll <gsing...@apache.org> wrote:

>
> On Jan 3, 2012, at 5:59 PM, Ted Dunning wrote:
>
> > You math is correct.
> >
> > When you say you have 105 features, what do you mean?
>
> Sorry, that should have been 105 categories/labels.  I'm trying to do the
> ASF email equivalent of 20 news groups, but  in this case it's 105 ASF
> projects.  The basic task is to try and predict what project an email
> belongs to based on its content.
>
> >  Are these textual
> > features?  Or what?
> >
> > On Tue, Jan 3, 2012 at 2:53 PM, Grant Ingersoll <gsing...@apache.org>
> wrote:
> >
> >> I'm trying to run the full ASF email SGD classifier problem and am
> facing
> >> heap size issues.  My current setup has 105 features and I am using a
> >> cardinality of 100K.  I'm using the AdaptiveLogisticRegression.  I'm
> >> getting heap errors and they occur when trying to construct the ALR
> class
> >> (i.e. not later during training).
> >>
> >> Just trying to check my math on memory:
> >> ALR comes with 20 CrossFoldLearners (CFL) and each of those comes with 5
> >> OnlineLogisticRegression instances, which each have a DenseMatrix of
> >> (numFeatures -1) X cardinality, plus some other vectors.
> >>
> >> This means, in my case, I have:
> >> 20 x 5 x (104 x 100,000 x sizeof(double)) = 332,800,000,000 bits = ~39
> GB
> >>
> >> Am I understanding the major parts of memory for ALR correctly?  In
> other
> >> words, I need to tone down the number of CFLs in the TrainASFEmail.java
> >> file so as to not use 20 CFLs, right?
>
>
>

Reply via email to