I have been swamped. Generally ad adagrad is a great idea. The code looks fine at first glance. Certainly some sort of adagrad would be preferable to the hack that I put in.
Sent from my iPhone > On Feb 26, 2014, at 18:30, Vishal Santoshi <vishal.santo...@gmail.com> wrote: > > Ted, Any feedback ? > > > On Mon, Feb 24, 2014 at 2:58 PM, Vishal Santoshi > <vishal.santo...@gmail.com>wrote: > >> Hello Ted, >> >> This is regarding AdaGrad update per feature.Have >> attached a file which reflects >> http://www.ark.cs.cmu.edu/cdyer/adagrad.pdf ( 2 ) >> >> >> >> It does differ from OnlineLogisticRegression in the way it implements >> >> public double perTermLearningRate(int j) ; >> >> >> This class maintains 2 Dense Vectors >> >> /** >> >> * ADA Per Term Sum of Squares of Learning gradients >> >> */ >> >> protected Vector perTermLSumOfSquaresOfGradients; >> >> /** >> >> * ADA Per Term Learning gradient >> >> */ >> >> protected Vector perTermGradients; >> >> and it overrides the learn(.... ) method to update these two vectors >> respectively. >> >> >> >> >> Please tell me if I am totally off here. >> >> >> >> Thank you for your help and Regards. >> >> >> Vishal Santoshi. >> >> >> PS . I had wrongly interpreted the code. last 2 emails. Please ignore. >> >> >> >> On Sun, Dec 29, 2013 at 8:45 PM, Ted Dunning <ted.dunn...@gmail.com>wrote: >> >>> :-) >>> >>> Many leaks are *very* subtle. >>> >>> One leak that had me going for weeks was in a news wire corpus. I >>> couldn't >>> figure out why the cross validation was so good and running the classifier >>> on new data was soooo much worse. >>> >>> The answer was that the training corpus had near-duplicate articles. This >>> means that there was leakage between the training and test corpora. This >>> wasn't quite a target leak, but it was a leak. >>> >>> For target leaks, it is very common to have partial target leaks due to >>> the >>> fact that you learn more about positive cases after the moment that you >>> had >>> to select which case to investigate. Suppose, for instance you are >>> targeting potential customers based on very limited information. If you >>> make an enticing offer to the people you target, then those who accept the >>> offer will buy something from you. You will also learn some particulars >>> such as name and address from those who buy from you. >>> >>> Looking retrospectively, it looks like you can target good customers who >>> have names or addresses that are not null. Without a good snapshot of >>> each >>> customer record at exactly the time that the targeting was done, you >>> cannot >>> know that *all* customers have a null name and address before you target >>> them. This sort of time machine leak can be enormously more subtle than >>> this. >>> >>> >>> >>>> On Mon, Dec 2, 2013 at 1:50 PM, Gokhan Capan <gkhn...@gmail.com> wrote: >>>> >>>> Gokhan >>>> >>>> >>>> On Thu, Nov 28, 2013 at 3:18 AM, Ted Dunning <ted.dunn...@gmail.com> >>>> wrote: >>>> >>>>> On Wed, Nov 27, 2013 at 7:07 AM, Vishal Santoshi < >>>>> vishal.santo...@gmail.com> >>>>> >>>>>> >>>>>> >>>>>> Are we to assume that SGD is still a work in progress and >>>>> implementations ( >>>>>> Cross Fold, Online, Adaptive ) are too flawed to be realistically >>> used >>>> ? >>>>>> >>>>> >>>>> They are too raw to be accepted uncritically, for sure. They have >>> been >>>>> used successfully in production. >>>>> >>>>> >>>>>> The evolutionary algorithm seems to be the core of >>>>>> OnlineLogisticRegression, >>>>>> which in turn builds up to Adaptive/Cross Fold. >>>>>> >>>>>>>> b) for truly on-line learning where no repeated passes through the >>>>> data.. >>>>>> >>>>>> What would it take to get to an implementation ? How can any one >>> help ? >>>>>> >>>>> >>>>> Would you like to help on this? The amount of work required to get a >>>>> distributed asynchronous learner up is moderate, but definitely not >>> huge. >>>>> >>>> >>>> Ted, do you describe a generic distributed learner for all kinds of >>> online >>>> algorithms? Possibly zookeeper-coordinated and with #predict and >>>> #getFeedbackAndUpdateTheModel methods? >>>> >>>>> >>>>> I think that OnlineLogisticRegression is basically sound, but should >>> get >>>> a >>>>> better learning rate update equation. That would largely make the >>>>> Adaptive* stuff unnecessary, expecially if OLR could be used in the >>>>> distributed asynchronous learner. >>>>> >>>> >>> >> >>