Hell Ted, Are we to assume that SGD is still a work in progress and implementations ( Cross Fold, Online, Adaptive ) are too flawed to be realistically used ? The evolutionary algorithm seems to be the core of OnlineLogisticRegression, which in turn builds up to Adaptive/Cross Fold.
>>b) for truly on-line learning where no repeated passes through the data.. What would it take to get to an implementation ? How can any one help ? Regards, On Wed, Nov 27, 2013 at 2:26 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > Well, first off, let me say that I am much less of a fan now of the magical > cross validation approach and adaptation based on that than I was when I > wrote the ALR code. There are definitely legs in the ideas, but my > implementation has a number of flaws. > > For example: > > a) the way that I provide for handling multiple passes through the data is > very easy to screw up. I think that simply separating the data entirely > might be a better approach. > > b) for truly on-line learning where no repeated passes through the data > will ever occur, then cross validation is not the best choice. Much better > in those cases to use what Google researchers described in [1]. > > c) it is clear from several reports that the evolutionary algorithm > prematurely shuts down the learning rate. I think that Adagrad-like > learning rates are more reliable. See [1] again for one of the more > readable descriptions of this. See also [2] for another view on adaptive > learning rates. > > d) item (c) is also related to the way that learning rates are adapted in > the underlying OnlineLogisticRegression. That needs to be fixed. > > e) asynchronous parallel stochastic gradient descent with mini-batch > learning is where we should be headed. I do not have time to write it, > however. > > All this aside, I am happy to help in any way that I can given my recent > time limits. > > > [1] http://research.google.com/pubs/pub41159.html > > [2] http://www.cs.jhu.edu/~mdredze/publications/cw_nips_08.pdf > > > > On Tue, Nov 26, 2013 at 12:54 PM, optimusfan <optimus...@yahoo.com> wrote: > > > Hi- > > > > We're currently working on a binary classifier using > > Mahout's AdaptiveLogisticRegression class. We're trying to determine > > whether or not the models are suffering from high bias or variance and > were > > wondering how to do this using Mahout's APIs? I can easily calculate the > > cross validation error and I think I could detect high bias or variance > if > > I could compare that number to my training error, but I'm not sure how to > > do this. Or, any other ideas would be appreciated! > > > > Thanks, > > Ian >