Got it. Thanks so much, Ted.

One more question, I am also trying to test the MixedGradient, it looks like
the RankingGradient will take much more time than the DefaultGradient.

If I set the alpha to 0.5, it will take 50 times of time comparing to the
DefaultGradient, I thought it should be like that for the RankingGradient
will do lots of Ranking comparison, and I heard that the algorithm is not
sensitive to alpha, how would you suggest a alpha I should choose? I haven't
found much material or suggestion about that.

Best wishes,
Stanley Xu



On Fri, Apr 22, 2011 at 6:04 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> It is definitely a reasonable idea to convert data to hashed feature
> vectors using map-reduce.
>
> And yes, you can pick a vector length that is long enough so that you don't
> have to worry about
> collisions.  You need to examine your data to decide how large that needs
> to be, but it isn't hard
> to do.  The encoding framework handles to the placement of features in the
> vector for you.  You
> don't have to worry about that.
>
>
> On Wed, Apr 20, 2011 at 8:03 PM, Stanley Xu <wenhao...@gmail.com> wrote:
>
>> Thanks Ted. Since the SGD is a sequential method, so the Vector be created
>> for each line could be very large and won't consume too much memory. Could
>> I
>> assume if we have limited number of features, or could use the map-reduce
>> to
>> pre-process the data to know how many different values in a category could
>> have, we could just create a long vector, and put different feature values
>> to different slot to avoid the possible feature collision?
>>
>> Thanks,
>> Stanley
>>
>>
>>
>> On Thu, Apr 21, 2011 at 12:24 AM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>>
>> > Stanley,
>> >
>> > Yes.  What you say is correct.  Feature hashing can cause degradation.
>> >
>> > With multiple hashing, however, you do have a fairly strong guarantee
>> that
>> > the feature hashing is very close to information preserving.  This is
>> > related to the fact that the feature hashing operation is a random
>> linear
>> > transformation.  Since we are hashing to something that is still quite a
>> > high dimensional space, the information loss is likely to be minimal.
>> >
>> > On Wed, Apr 20, 2011 at 6:06 AM, Stanley Xu <wenhao...@gmail.com>
>> wrote:
>> >
>> > > Dear all,
>> > >
>> > > Per my understand, what Feature Hashing did in SGD do compress the
>> > Feature
>> > > Dimensions to a fixed length Vector. Won't that make the training
>> result
>> > > incorrect if Feature Hashing Collision happened? Won't the two
>> features
>> > > hashed to the same slot would be thought as the same feature? Even if
>> we
>> > > have multiple probes to reduce the total collision like a bloom
>> filter.
>> > > Won't it also make the slot that has the collision looks like a
>> > combination
>> > > feature?
>> > >
>> > > Thanks.
>> > >
>> > > Best wishes,
>> > > Stanley Xu
>> > >
>> >
>>
>
>

Reply via email to