I do see the regularize has the prior ( LI and L2 ) depend on * perTermLearningRate(j)) ...*
On Thu, Feb 20, 2014 at 11:49 AM, Vishal Santoshi <vishal.santo...@gmail.com > wrote: > Hey Ted, > > >> I presume that you would like Adagrad-like solution to replace the > above ? > > Things that I could glean out. > > > > > * Maintain a simple d-dimensional vector representing to store a running > total of the squares of the gradients, where d is the number of terms. Say > *gradients*. > > > > > * Based on > > "Since the learning rate for each feature is quickly adapted, the > value for is far less important than it is with SGD. I have used = 1:0 for > a very large number of different problems. The primary role of > is to determine how much a feature changes the very first time it is > encountered, so in problems with large numbers of extremely rare features, > some additional care may be warranted." > > *How important or even necessary is perTermLearningRate(j) ?* > > > > > * double newValue = beta.getQuick(i, j) + gradientBase * learningRate * > perTermLearningRate(j) * instance.get(j); > > becomes > > double newGradient = beta.getQuick(i, j) + ( learningRate / Math.sqrt( > *gradients*(i)) )* instance.get(j); > > *gradients*(i) = *gradients*(i) + newGradient ^2; > > > > > > Does this make sense ? The only thing is that the abstract class changes. > > > Regards. > > > > > On Sun, Dec 29, 2013 at 8:45 PM, Ted Dunning <ted.dunn...@gmail.com>wrote: > >> :-) >> >> Many leaks are *very* subtle. >> >> One leak that had me going for weeks was in a news wire corpus. I >> couldn't >> figure out why the cross validation was so good and running the classifier >> on new data was soooo much worse. >> >> The answer was that the training corpus had near-duplicate articles. This >> means that there was leakage between the training and test corpora. This >> wasn't quite a target leak, but it was a leak. >> >> For target leaks, it is very common to have partial target leaks due to >> the >> fact that you learn more about positive cases after the moment that you >> had >> to select which case to investigate. Suppose, for instance you are >> targeting potential customers based on very limited information. If you >> make an enticing offer to the people you target, then those who accept the >> offer will buy something from you. You will also learn some particulars >> such as name and address from those who buy from you. >> >> Looking retrospectively, it looks like you can target good customers who >> have names or addresses that are not null. Without a good snapshot of >> each >> customer record at exactly the time that the targeting was done, you >> cannot >> know that *all* customers have a null name and address before you target >> them. This sort of time machine leak can be enormously more subtle than >> this. >> >> >> >> On Mon, Dec 2, 2013 at 1:50 PM, Gokhan Capan <gkhn...@gmail.com> wrote: >> >> > Gokhan >> > >> > >> > On Thu, Nov 28, 2013 at 3:18 AM, Ted Dunning <ted.dunn...@gmail.com> >> > wrote: >> > >> > > On Wed, Nov 27, 2013 at 7:07 AM, Vishal Santoshi < >> > > vishal.santo...@gmail.com> >> > > >> > > > >> > > > >> > > > Are we to assume that SGD is still a work in progress and >> > > implementations ( >> > > > Cross Fold, Online, Adaptive ) are too flawed to be realistically >> used >> > ? >> > > > >> > > >> > > They are too raw to be accepted uncritically, for sure. They have >> been >> > > used successfully in production. >> > > >> > > >> > > > The evolutionary algorithm seems to be the core of >> > > > OnlineLogisticRegression, >> > > > which in turn builds up to Adaptive/Cross Fold. >> > > > >> > > > >>b) for truly on-line learning where no repeated passes through the >> > > data.. >> > > > >> > > > What would it take to get to an implementation ? How can any one >> help ? >> > > > >> > > >> > > Would you like to help on this? The amount of work required to get a >> > > distributed asynchronous learner up is moderate, but definitely not >> huge. >> > > >> > >> > Ted, do you describe a generic distributed learner for all kinds of >> online >> > algorithms? Possibly zookeeper-coordinated and with #predict and >> > #getFeedbackAndUpdateTheModel methods? >> > >> > > >> > > I think that OnlineLogisticRegression is basically sound, but should >> get >> > a >> > > better learning rate update equation. That would largely make the >> > > Adaptive* stuff unnecessary, expecially if OLR could be used in the >> > > distributed asynchronous learner. >> > > >> > >> > >