I do see the regularize has  the prior ( LI and L2 )  depend on *
perTermLearningRate(j))
...*


On Thu, Feb 20, 2014 at 11:49 AM, Vishal Santoshi <vishal.santo...@gmail.com
> wrote:

> Hey Ted,
>
> >> I presume that you would like  Adagrad-like solution to replace the
> above ?
>
> Things that I could glean out.
>
>
>
>
>  *  Maintain a simple d-dimensional vector representing to store a running
> total of the squares of the gradients, where d is the number of terms.  Say
> *gradients*.
>
>
>
>
> *  Based on
>
>      "Since the learning rate for each feature is quickly adapted, the
> value for is far less important than it is with SGD. I have used = 1:0 for
> a very large number of different problems. The primary role of
>      is to determine how much a feature changes the very first time it is
> encountered, so in problems with large numbers of extremely rare features,
> some additional care may be warranted."
>
>      *How important or even necessary is  perTermLearningRate(j)  ?*
>
>
>
>
> *  double newValue = beta.getQuick(i, j) + gradientBase * learningRate *
> perTermLearningRate(j) * instance.get(j);
>
>    becomes
>
>     double newGradient = beta.getQuick(i, j) + ( learningRate / Math.sqrt(
> *gradients*(i)) )* instance.get(j);
>
>     *gradients*(i)  = *gradients*(i) + newGradient ^2;
>
>
>
>
>
> Does this make sense ? The only thing is that the abstract class changes.
>
>
> Regards.
>
>
>
>
> On Sun, Dec 29, 2013 at 8:45 PM, Ted Dunning <ted.dunn...@gmail.com>wrote:
>
>> :-)
>>
>> Many leaks are *very* subtle.
>>
>> One leak that had me going for weeks was in a news wire corpus.  I
>> couldn't
>> figure out why the cross validation was so good and running the classifier
>> on new data was soooo much worse.
>>
>> The answer was that the training corpus had near-duplicate articles.  This
>> means that there was leakage between the training and test corpora.  This
>> wasn't quite a target leak, but it was a leak.
>>
>> For target leaks, it is very common to have partial target leaks due to
>> the
>> fact that you learn more about positive cases after the moment that you
>> had
>> to select which case to investigate.  Suppose, for instance you are
>> targeting potential customers based on very limited information.  If you
>> make an enticing offer to the people you target, then those who accept the
>> offer will buy something from you.  You will also learn some particulars
>> such as name and address from those who buy from you.
>>
>> Looking retrospectively, it looks like you can target good customers who
>> have names or addresses that are not null.  Without a good snapshot of
>> each
>> customer record at exactly the time that the targeting was done, you
>> cannot
>> know that *all* customers have a null name and address before you target
>> them.  This sort of time machine leak can be enormously more subtle than
>> this.
>>
>>
>>
>> On Mon, Dec 2, 2013 at 1:50 PM, Gokhan Capan <gkhn...@gmail.com> wrote:
>>
>> > Gokhan
>> >
>> >
>> > On Thu, Nov 28, 2013 at 3:18 AM, Ted Dunning <ted.dunn...@gmail.com>
>> > wrote:
>> >
>> > > On Wed, Nov 27, 2013 at 7:07 AM, Vishal Santoshi <
>> > > vishal.santo...@gmail.com>
>> > >
>> > > >
>> > > >
>> > > > Are we to assume that SGD is still a work in progress and
>> > > implementations (
>> > > > Cross Fold, Online, Adaptive ) are too flawed to be realistically
>> used
>> > ?
>> > > >
>> > >
>> > > They are too raw to be accepted uncritically, for sure.  They have
>> been
>> > > used successfully in production.
>> > >
>> > >
>> > > > The evolutionary algorithm seems to be the core of
>> > > > OnlineLogisticRegression,
>> > > > which in turn builds up to Adaptive/Cross Fold.
>> > > >
>> > > > >>b) for truly on-line learning where no repeated passes through the
>> > > data..
>> > > >
>> > > > What would it take to get to an implementation ? How can any one
>> help ?
>> > > >
>> > >
>> > > Would you like to help on this?  The amount of work required to get a
>> > > distributed asynchronous learner up is moderate, but definitely not
>> huge.
>> > >
>> >
>> > Ted, do you describe a generic distributed learner for all kinds of
>> online
>> > algorithms? Possibly zookeeper-coordinated and with #predict and
>> > #getFeedbackAndUpdateTheModel methods?
>> >
>> > >
>> > > I think that OnlineLogisticRegression is basically sound, but should
>> get
>> > a
>> > > better learning rate update equation.  That would largely make the
>> > > Adaptive* stuff unnecessary, expecially if OLR could be used in the
>> > > distributed asynchronous learner.
>> > >
>> >
>>
>
>

Reply via email to