With some help from Ted (which I plan to turn into a checked-in tool
if he doesn't get there first), I'm running LR on my initial small
example.

I adapted Ted's rcv1 sample to digest a directory containing
subdirectories containing exemplars.

Ted's delightfully small program pushes all of the data into the model
'n' times (n is 10 in my current variation. It displays the best
learner's accuracy at each iteration.

The example is 1000 docs in 10 categories.

With 20k-features, I note that the accuracy scores get worse on each
iteration of pushing the data into the model.

After the first pass, the model hasn't trained yet. After the second,
accuracy is 95.6%, and then if drifts gracefully downward with each
additional iteration, landing at .83.

I'm puzzled; I'm accustomed to overfitting causing scores to inflate,
but this pattern is not intuitive to me.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@maven.apache.org
For additional commands, e-mail: users-h...@maven.apache.org

Reply via email to