Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

Ted Dunning Fri, 28 Feb 2014 20:58:19 -0800

I have been swamped.  Generally ad adagrad is a great idea. The code looks fine 
at first glance.  Certainly some sort of adagrad would be preferable to the 
hack that I put in.


Sent from my iPhone

> On Feb 26, 2014, at 18:30, Vishal Santoshi <vishal.santo...@gmail.com> wrote:
> 
> Ted,  Any feedback ?
> 
> 
> On Mon, Feb 24, 2014 at 2:58 PM, Vishal Santoshi
> <vishal.santo...@gmail.com>wrote:
> 
>> Hello Ted,
>> 
>>                  This is regarding AdaGrad update per feature.Have
>> attached  a file which reflects
>> http://www.ark.cs.cmu.edu/cdyer/adagrad.pdf  ( 2 )
>> 
>> 
>> 
>> It does differ from OnlineLogisticRegression in the way it implements
>> 
>> public double perTermLearningRate(int j) ;
>> 
>> 
>> This class maintains 2 Dense Vectors
>> 
>> /**
>> 
>> * ADA  Per Term Sum of Squares of Learning gradients
>> 
>> */
>> 
>> protected Vector perTermLSumOfSquaresOfGradients;
>> 
>> /**
>> 
>> * ADA Per Term Learning gradient
>> 
>> */
>> 
>> protected Vector perTermGradients;
>> 
>> and it overrides the learn(.... ) method to  update these two vectors
>> respectively.
>> 
>> 
>> 
>> 
>> Please tell me if I am totally off here.
>> 
>> 
>> 
>> Thank you for your help and Regards.
>> 
>> 
>> Vishal Santoshi.
>> 
>> 
>> PS . I had wrongly interpreted the code. last 2 emails. Please ignore.
>> 
>> 
>> 
>> On Sun, Dec 29, 2013 at 8:45 PM, Ted Dunning <ted.dunn...@gmail.com>wrote:
>> 
>>> :-)
>>> 
>>> Many leaks are *very* subtle.
>>> 
>>> One leak that had me going for weeks was in a news wire corpus.  I
>>> couldn't
>>> figure out why the cross validation was so good and running the classifier
>>> on new data was soooo much worse.
>>> 
>>> The answer was that the training corpus had near-duplicate articles.  This
>>> means that there was leakage between the training and test corpora.  This
>>> wasn't quite a target leak, but it was a leak.
>>> 
>>> For target leaks, it is very common to have partial target leaks due to
>>> the
>>> fact that you learn more about positive cases after the moment that you
>>> had
>>> to select which case to investigate.  Suppose, for instance you are
>>> targeting potential customers based on very limited information.  If you
>>> make an enticing offer to the people you target, then those who accept the
>>> offer will buy something from you.  You will also learn some particulars
>>> such as name and address from those who buy from you.
>>> 
>>> Looking retrospectively, it looks like you can target good customers who
>>> have names or addresses that are not null.  Without a good snapshot of
>>> each
>>> customer record at exactly the time that the targeting was done, you
>>> cannot
>>> know that *all* customers have a null name and address before you target
>>> them.  This sort of time machine leak can be enormously more subtle than
>>> this.
>>> 
>>> 
>>> 
>>>> On Mon, Dec 2, 2013 at 1:50 PM, Gokhan Capan <gkhn...@gmail.com> wrote:
>>>> 
>>>> Gokhan
>>>> 
>>>> 
>>>> On Thu, Nov 28, 2013 at 3:18 AM, Ted Dunning <ted.dunn...@gmail.com>
>>>> wrote:
>>>> 
>>>>> On Wed, Nov 27, 2013 at 7:07 AM, Vishal Santoshi <
>>>>> vishal.santo...@gmail.com>
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Are we to assume that SGD is still a work in progress and
>>>>> implementations (
>>>>>> Cross Fold, Online, Adaptive ) are too flawed to be realistically
>>> used
>>>> ?
>>>>>> 
>>>>> 
>>>>> They are too raw to be accepted uncritically, for sure.  They have
>>> been
>>>>> used successfully in production.
>>>>> 
>>>>> 
>>>>>> The evolutionary algorithm seems to be the core of
>>>>>> OnlineLogisticRegression,
>>>>>> which in turn builds up to Adaptive/Cross Fold.
>>>>>> 
>>>>>>>> b) for truly on-line learning where no repeated passes through the
>>>>> data..
>>>>>> 
>>>>>> What would it take to get to an implementation ? How can any one
>>> help ?
>>>>>> 
>>>>> 
>>>>> Would you like to help on this?  The amount of work required to get a
>>>>> distributed asynchronous learner up is moderate, but definitely not
>>> huge.
>>>>> 
>>>> 
>>>> Ted, do you describe a generic distributed learner for all kinds of
>>> online
>>>> algorithms? Possibly zookeeper-coordinated and with #predict and
>>>> #getFeedbackAndUpdateTheModel methods?
>>>> 
>>>>> 
>>>>> I think that OnlineLogisticRegression is basically sound, but should
>>> get
>>>> a
>>>>> better learning rate update equation.  That would largely make the
>>>>> Adaptive* stuff unnecessary, expecially if OLR could be used in the
>>>>> distributed asynchronous learner.
>>>>> 
>>>> 
>>> 
>> 
>>

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

Reply via email to