train classifier was "the" old implementation. I am assuming you are using
a version < 0.7. You could either try 0.7 or get the latest source from svn

the new implementation works both with seq2sparse (the one that generates
tfidf vectors) and the new seq2encoded(the one that generates vectors using
hashing trick)

See examples/bin/classify-20newsgroups.sh

Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.


On Thu, Apr 18, 2013 at 9:03 PM, Ryan Compton <compton.r...@gmail.com>wrote:

> When I use "trainclassifier" I am able to run the 20 news groups just
> fine. I'm also able to train on my own data up until around 10M
> training documents.
>
> Once I have enough training data, I find that "trainclassifier"
> succeeds and "testclassifier" fails. I have no idea if it was a
> training or testing problem. The errors reported by "testclassifier"
> are http://pastebin.com/YKqbjAQH . I have a suspicion that I am
> training on too much data, and need to increase the minDf, but I don't
> see a way to do it with "trainclassifier"
>
> While looking around for a fix, I read that "trainclassifier" is the
> old way, and that "trainnb" fixed some unusual back-end errors (which
> I suspect is what I'm getting).  What is the difference? Is there any
> reason for me to start figuring how to use "trainnb"?
>

Reply via email to