I'm trying to run the full ASF email SGD classifier problem and am facing heap
size issues. My current setup has 105 features and I am using a cardinality of
100K. I'm using the AdaptiveLogisticRegression. I'm getting heap errors and
they occur when trying to construct the ALR class (i.e. not
You math is correct.
When you say you have 105 features, what do you mean? Are these textual
features? Or what?
On Tue, Jan 3, 2012 at 2:53 PM, Grant Ingersoll wrote:
> I'm trying to run the full ASF email SGD classifier problem and am facing
> heap size issues. My current setup has 105 feat
Does these algorithms have good locality? For doing giant online
computations it might be worth storing these in memory-mapped files.
Or, give up and get the M/R SGD code in.
On Tue, Jan 3, 2012 at 2:59 PM, Ted Dunning wrote:
> You math is correct.
>
> When you say you have 105 features, what do
No. They don't have particularly good locality. The would have moderate
hotspots, but these would be scatter all over. The hotspots might allow L2
cache to help, but would not allow disk based data to work.
The major opportunity for improvement here is to incorporate some of the
advances that V
On Jan 3, 2012, at 5:59 PM, Ted Dunning wrote:
> You math is correct.
>
> When you say you have 105 features, what do you mean?
Sorry, that should have been 105 categories/labels. I'm trying to do the ASF
email equivalent of 20 news groups, but in this case it's 105 ASF projects.
The basic
Ahh... of course. I should have understood that from the multiplication
you did since 104 = 105-1.
On Tue, Jan 3, 2012 at 7:58 PM, Grant Ingersoll wrote:
>
> On Jan 3, 2012, at 5:59 PM, Ted Dunning wrote:
>
> > You math is correct.
> >
> > When you say you have 105 features, what do you mean?
>