OnlineLogisticRegression: Are my settings sensible

2013-11-07 Thread Andreas Bauer
Hi, I’m trying to use OnlineLogisticRegression for a two-class classification problem, but as my classification results are not very good, I wanted to ask for support to find out if my settings are correct and if I’m using Mahout correctly. Because if I’m doing it correctly then probably my f

Re: OnlineLogisticRegression: Are my settings sensible

2013-11-07 Thread Ted Dunning
Why is FEATURE_NUMBER != 13? With 12 features that are already lovely and continuous, just stick them in elements 1..12 of a 13 long vector and put a constant value at the beginning of it. Hashed encoding is good for sparse stuff, but confusing for your case. Also, it looks like you only pass th

Re: OnlineLogisticRegression: Are my settings sensible

2013-11-07 Thread Andreas Bauer
Hi, Thanks for your comments. I modified the examples from the mahout in action book, therefore I used the hashed approach and that's why i used 100 features. I'll adjust the number. You say that I'm using the same CVE for all features, so you mean i should create 12 separate CVE for adding

Re: OnlineLogisticRegression: Are my settings sensible

2013-11-07 Thread Ted Dunning
On Thu, Nov 7, 2013 at 9:45 PM, Andreas Bauer wrote: > Hi, > > Thanks for your comments. > > I modified the examples from the mahout in action book, therefore I used > the hashed approach and that's why i used 100 features. I'll adjust the > number. > Makes sense. But the book was doing sparse

Re: OnlineLogisticRegression: Are my settings sensible

2013-11-08 Thread Andreas Bauer
Ok, I'll have a look. Thanks! I know mahout is intended for large scale machine learning, but I guess it shouldn't have problems with such small data either. Ted Dunning schrieb: >On Thu, Nov 7, 2013 at 9:45 PM, Andreas Bauer wrote: > >> Hi, >> >> Thanks for your comments. >> >> I modified

Re: OnlineLogisticRegression: Are my settings sensible

2013-11-08 Thread Ted Dunning
You are correct that it should work with smaller data as well, but the trade-offs are going to be very different. In particular, some algorithms are completely infeasible at large scale, but are very effective at small scale. Some like those used in glmnet inherently require multiple passes throu