Re: Interest in Self Organizing Maps?

2013-03-30 Thread Hector Yee
Here's the ICML pre-print J. Weston, A. Makadia, H. Yee. Label Partitioning for Sublinear Ranking<http://www.thespermwhale.com/jaseweston/papers/label_partitioner.pdf> , *ICML 2013 *. On Sat, Mar 30, 2013 at 10:56 AM, Hector Yee wrote: > If you're going the embedding rou

Re: Interest in Self Organizing Maps?

2013-03-30 Thread Hector Yee
Label Partitioning For Sublinear Ranking * Jason Weston, Ameesh Makadia, Hector Yee I was going to modify https://issues.apache.org/jira/browse/MAHOUT-703 to do this when I was in as startup, as essentially wsabie is very similar to a 2 layer NN without the sigmoid and with the WARP update rule (in the w

Re: BallKMeans clustering issues detailed

2013-03-22 Thread Hector Yee
You may have better results if you used a learnt embedding space instead if random On Mar 22, 2013 3:43 AM, "Dan Filimon" wrote: > Hi everyone, > > Ted and me noticed some issues with the BallKMeans implementation. > When splitting the 20 newsgoups dataset into training and testing as > per the "

Re: Fwd: Neural Network and Restricted Boltzman Machine in Mahout

2013-03-14 Thread Hector Yee
How do you do 'asynchronous sgd" with hadoop? (I guess referring to the spark discussion in a different thread) In any case there are ways to get similar performance without having to use that many cores. See our paper 'affinity weighted embedding" http://openreview.net/iclr2013 table 2, compar

Re: Turn on code inspections, please

2012-06-10 Thread Hector Yee
I agree re: bags of bits of code. e.g there is no single binary that can try all permutations of various techniques on a single data set and pick the best model. instead there seems to be N different binaries for every (classifier) technique at least circa 0.5 Is there no presubmit checkin hook fo

Re: What is content based recommendation, to you

2012-06-08 Thread Hector Yee
I think Sean's post is to use item attributes but nothing prevents you from using user attributes too On Jun 7, 2012 11:43 PM, "yswa...@gmail.com" wrote:

Re: Commercializing Mahout: the Myrrix recommender platform

2012-04-23 Thread Hector Yee
Yes sorry the enron email corpus On Apr 21, 2012 4:32 PM, "Lance Norskog" wrote: > Hector, perhaps you meant my former employer Enron? > > On Sat, Apr 21, 2012 at 4:20 AM, Grant Ingersoll > wrote: > > > > On Apr 20, 2012, at 12:05 PM, Hector Yee wrote: > &

Re: Quartiles computation with M/R or Pig (combine function states)

2012-04-20 Thread Hector Yee
how about this http://en.wikipedia.org/wiki/Reservoir_sampling On Fri, Apr 20, 2012 at 10:44 AM, Dmitriy Lyubimov wrote: > Hello, > > There should be some way to compile quartiles in a map/reduce fashion > (i.e. with api similar to Pig's Arithmetic custom function) without > keeping enormous cou

Re: Commercializing Mahout: the Myrrix recommender platform

2012-04-20 Thread Hector Yee
On a related note, wish i could share the data i have to see how these algorithms stack up to the ones we use for large scale learning. Are there other examples of large data sets people use? I know there's the Exxon one and possibly the one used in the netflix prize. There's also image net but

[jira] [Commented] (MAHOUT-716) Implement Boosting

2012-03-20 Thread Hector Yee (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234013#comment-13234013 ] Hector Yee commented on MAHOUT-716: --- Thanks for the review Isabel. The git used t

Re: CF: parallel SGD for matrix factorization by Gemulla et al.

2012-03-15 Thread Hector Yee
Have you read the lock free hog wild paper? Just sgd with multiple threads and don't be afraid of memory stomps. It works faster than batch On Mar 15, 2012 2:32 PM, "Dmitriy Lyubimov" wrote: > We already discussed the paper before. In fact, i had exactly same > idea for partitioning the factoriza

Re: [jira] [Updated] (MAHOUT-716) Implement Boosting

2012-03-15 Thread Hector Yee
HOUT-716 > > > Project: Mahout > > > Issue Type: New Feature > > > Components: Classification > > >Affects Versions: 0.5 > > >Reporter: Hector Yee > > >Assignee: Ted Dunning > > >Priorit

[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2012-02-27 Thread Hector Yee (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217609#comment-13217609 ] Hector Yee commented on MAHOUT-975: --- If MLP is a more general implementation why d

Re: Neural network implementation

2012-01-26 Thread Hector Yee
depending on what you mean by neural network a two layer one is already in checked in https://issues.apache.org/jira/browse/MAHOUT-703 feel free to re-write it or improve upon it On Thu, Jan 26, 2012 at 9:00 AM, Saikat Kanjilal wrote: > > Hi All,I'd be interested in helping out in building a ne

Re: Welcome Dmitriy Lyubimov to Mahout PMC

2011-12-28 Thread Hector Yee
Congrats! On Thu, Dec 29, 2011 at 11:24 AM, Dmitriy Lyubimov wrote: > Thank you, Grant. I am happy to continue working on Mahout. > > On Wed, Dec 28, 2011 at 1:47 PM, Grant Ingersoll > wrote: > > I'm pleased to announce the Mahout PMC has elected to add Dmitriy to the > PMC. Dmitriy has been a

Re: Gradient descent for linear regression

2011-10-25 Thread Hector Yee
Its checked in as passive aggressive or a one line change On Oct 25, 2011 12:07 AM, "Tommaso Teofili" wrote: > Hi all, > recently I've been working with Octave [1] to implement the gradient > descent > algorithm for linear regression (with uni/multi features) and I wonder if > such an implementat

Re: MAHOUT-232 status?

2011-10-10 Thread Hector Yee
Yeah whoever's volunteering to review let me know if you have any questions, the papers that describe the algorithms should be in the patches themselves. On Sat, Oct 8, 2011 at 12:03 PM, Ted Dunning wrote: > It does. > > Hector's series of patches need help as well. > > On Sat, Oct 8, 2011 at 11

Re: Kernel Ridge Regression

2011-09-20 Thread Hector Yee
; the size of the dataset it could handle, how much data can (MAHOUT-702) > handle and in which PC (or cluster) configuration ? > > On Wed, Sep 21, 2011 at 3:44 AM, deneche abdelhakim >wrote: > > > cool, thanks :) > > > > > > On Tue, Sep 2

Re: Kernel Ridge Regression

2011-09-20 Thread Hector Yee
Yeah its a two line change to PassiveAggressive.java (MAHOUT-702) change the loss to: loss = hinge ( | score - actual| - epsilon ) where hinge(x) = 0 if x < 0, x otherwise epsilon is a new param that controls how much error we tolerate tau remains the same delta = sign(actual - score) * tau * ins

Goodbye

2011-08-10 Thread Hector Yee
I'll be rejoining Google next week so I probably won't be able to contribute patches anymore. It's been fun working with Mahout. Later folks. Keep in touch on linked in (I already nuked my twitter account). -- Yee Yang Li Hector Professional Profil

Re: Question on entropy calculation

2011-07-15 Thread Hector Yee
yeah its -p log p if you use the normalized version it gives you the optimal number of bits to encode each symbol in the message. if its not normalized it's in some other units. On Fri, Jul 15, 2011 at 9:38 AM, Sean Owen wrote: > I stumped myself looking at the implementation of > LogLikelihood

Re: Hadoop serialization compression and precision loss

2011-07-14 Thread Hector Yee
Its not lossy, that would be a disaster if it was. You specify the compressor so you can use what codecs are supported, e.g. LZO On Thu, Jul 14, 2011 at 7:40 AM, Dhruv Kumar wrote: > On Thu, Jul 14, 2011 at 10:29 AM, Sean Owen wrote: > > > Serialization itself has no effect on accuracy; double

Re: L2 seems does not work

2011-06-29 Thread Hector Yee
The nans in logistic regression usually occur at the Math.exp. Try adding a breakpoint or assert not nan there to see what the input is. If its an overflow you can fix it by clamping. The argument to exp maxes out around 50 for floats before NaN ing. On Wed, Jun 29, 2011 at 9:38 AM, Xiaobo Gu w

Re: Iterative jobs

2011-06-19 Thread Hector Yee
So rather than adding new learning algorithms we should make the existing ones map-reduceable? Do you have a JIRA ticket for those that need tidying up? On Thu, Jun 16, 2011 at 10:54 AM, Sean Owen wrote: > > I personally don't think that this project needs more algorithms, and > am personally dir

[jira] [Commented] (MAHOUT-710) Implementing K-Trusses

2011-06-19 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051731#comment-13051731 ] Hector Yee commented on MAHOUT-710: --- I'd love to beta test it soon. How will

Re: Iterative jobs

2011-06-18 Thread Hector Yee
implementation, in theory. > > I think we'll see something like this happen, and then you could get > your low latency small data chunks from hbase or from the dcache, etc. > > JP > > On Thu, Jun 16, 2011 at 1:34 PM, Hector Yee wrote: >> What do people think of using Sp

Re: kmeans generates ovelapping clusters

2011-06-17 Thread Hector Yee
One vector be a member of only one cluster but there's no requirement for no overlaps. You get equal radius but the cluster centers could be close enough for them to overlap. On Fri, Jun 17, 2011 at 10:15 AM, djellel eddine Difallah < difal...@gmail.com> wrote: > Hello everyone, > > I tried kmean

Re: Iterative jobs

2011-06-16 Thread Hector Yee
ying up > Hadoop before moving on? That's just me. > > My gut says it would be cool to implement the SVD on something like > this to see how it goes. I don't yet see this is anything to move to. > > > On Thu, Jun 16, 2011 at 6:34 PM, Hector Yee wrote: >

Iterative jobs

2011-06-16 Thread Hector Yee
What do people think of using Spark for iterative jobs: http://www.spark-project.org/ Or is there a new version of hadoop that supports this kind of computation? -- Yee Yang Li Hector http://hectorgon.blogspot.com/ (tech + travel) http://hectorgon.com (book reviews)

[jira] [Issue Comment Edited] (MAHOUT-732) Implement ranking autoencoder on top of gradient machine

2011-06-15 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049954#comment-13049954 ] Hector Yee edited comment on MAHOUT-732 at 6/15/11 6:5

[jira] [Updated] (MAHOUT-732) Implement ranking autoencoder on top of gradient machine

2011-06-15 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated MAHOUT-732: -- Status: Patch Available (was: Open) Example command lines: Generate a ranking autoencoder trained

[jira] [Updated] (MAHOUT-732) Implement ranking autoencoder on top of gradient machine

2011-06-15 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated MAHOUT-732: -- Attachment: MAHOUT-732.gitpatch Working autoencoder > Implement ranking autoencoder on top

[jira] [Commented] (MAHOUT-716) Implement Boosting

2011-06-13 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048887#comment-13048887 ] Hector Yee commented on MAHOUT-716: --- Nope, waiting on Ted's feedback. &g

[jira] [Created] (MAHOUT-732) Implement ranking autoencoder on top of gradient machine

2011-06-13 Thread Hector Yee (JIRA)
Components: Clustering Affects Versions: 0.6 Reporter: Hector Yee Priority: Minor Fix For: 0.6 Implement a ranking autoencoder clusterer based on top of gradient machine. See https://docs.google.com/present/edit?id=0AQC247eq7Jp5ZGZ6NXpyOWhfMjlmM2pzdjRkZw&aut

[jira] [Commented] (MAHOUT-716) Implement Boosting

2011-06-13 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048754#comment-13048754 ] Hector Yee commented on MAHOUT-716: --- Any news on this patch? > Implement B

Re: Goodbye collections?

2011-06-10 Thread Hector Yee
Wow cool, love the closure support! On Thu, Jun 9, 2011 at 11:07 PM, Dawid Weiss wrote: > Hey, thanks guys! I'm shooting myself in the foot a little bit as an > author of a somewhat competing library (HPPC), but I do think fastutil > is a great choice for Mahout. HPPC is suitable for very specifi

[jira] [Commented] (MAHOUT-726) IntWritable / VectorWritable cast problem in classifier/clustering implementations

2011-06-09 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046684#comment-13046684 ] Hector Yee commented on MAHOUT-726: --- Turns out to be user error, I was using

Re: How to get the predicted target lable using CrossFolderLearner?

2011-06-07 Thread Hector Yee
I've used systems before that kept the original mapping to the classifier specific mapping. It can be nice because you can add new features and an old model may still work because the new features would be out of range of the old mappings. It can also provide a place to store score statistics (such

[jira] [Commented] (MAHOUT-716) Implement Boosting

2011-06-06 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044919#comment-13044919 ] Hector Yee commented on MAHOUT-716: --- Yeah I forked a git repo on git hub, it shoul

[jira] [Commented] (MAHOUT-716) Implement Boosting

2011-06-04 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044396#comment-13044396 ] Hector Yee commented on MAHOUT-716: --- Thanks for cleaning it up! I'm just start

[jira] [Issue Comment Edited] (MAHOUT-703) Implement Gradient machine

2011-06-04 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044391#comment-13044391 ] Hector Yee edited comment on MAHOUT-703 at 6/4/11 9:0

Re: [jira] [Assigned] (MAHOUT-716) Implement Boosting

2011-06-04 Thread Hector Yee
URL: https://issues.apache.org/jira/browse/MAHOUT-716 > > Project: Mahout > > Issue Type: New Feature > > Components: Classification > >Affects Versions: 0.6 > >Reporter: Hector Yee > >Assignee: Ted

[jira] [Commented] (MAHOUT-703) Implement Gradient machine

2011-06-04 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044391#comment-13044391 ] Hector Yee commented on MAHOUT-703: --- Thanks! I'll fix this and submit a

Re: How to get the predicted target lable using CrossFolderLearner?

2011-06-04 Thread Hector Yee
You can use classifyFull and then vector maxIndex Sent from my iPad On Jun 4, 2011, at 8:31 AM, "XiaoboGu" wrote: > Hi, >When dealing with multinomial logistic regression with CrossFolderLearner, > the Vector classfy(Vector) method returns a vector of scores for all the > target values, M

[jira] [Commented] (MAHOUT-716) Implement Boosting

2011-06-03 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043986#comment-13043986 ] Hector Yee commented on MAHOUT-716: --- Any feedback on this? I am planning to build

Re: 0.6: off to the races!

2011-06-03 Thread Hector Yee
These are all complete with unit tests and are ready to go: Implement Boosting Implement Gradient machine Implement Online Passive Aggressive learner< https://issues.apache.org/jira/browse/MAHOUT-7

[jira] [Commented] (MAHOUT-702) Implement Online Passive Aggressive learner

2011-06-02 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042850#comment-13042850 ] Hector Yee commented on MAHOUT-702: --- Is this good to go? > Implement Online

[jira] [Updated] (MAHOUT-716) Implement Boosting

2011-06-01 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated MAHOUT-716: -- Fix Version/s: 0.6 > Implement Boosting > -- > > Key

[jira] [Commented] (MAHOUT-703) Implement Gradient machine

2011-06-01 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042463#comment-13042463 ] Hector Yee commented on MAHOUT-703: --- Any news on this patch? I need it to implemen

Re: AdaBoost

2011-06-01 Thread Hector Yee
Patch uploaded https://issues.apache.org/jira/browse/MAHOUT-716 On Tue, May 24, 2011 at 1:57 PM, Wojciech Indyk wrote: > Hi! > I want implement AdaBoost in Mahout. Could it be useful in Mahout? I > think so, because it's strong algorithm and very powerful, but Mahout > is specific, so who knows

[jira] [Updated] (MAHOUT-716) Implement Boosting

2011-06-01 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated MAHOUT-716: -- Attachment: MAHOUT-716.patch > Implement Boosting > -- > >

[jira] [Updated] (MAHOUT-716) Implement Boosting

2011-06-01 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated MAHOUT-716: -- Status: Patch Available (was: Open) Implement boosting with decision stumps trained by Gradboost. For

Re: AdaBoost

2011-05-31 Thread Hector Yee
Wojciech, I've opened a ticked you can watch https://issues.apache.org/jira/browse/MAHOUT-716 I should have the in core code ready in ~3 days. The gradient portion is easily parallelizable if you want to implement it as mapreduce. On Tue, May 24, 2011 at 1:57 PM, Wojciech Indyk wrote: > Hi! >

[jira] [Created] (MAHOUT-716) Implement Boosting

2011-05-31 Thread Hector Yee (JIRA)
Implement Boosting -- Key: MAHOUT-716 URL: https://issues.apache.org/jira/browse/MAHOUT-716 Project: Mahout Issue Type: New Feature Components: Classification Affects Versions: 0.6 Reporter: Hector

Re: AdaBoost

2011-05-26 Thread Hector Yee
It is if you use the grad boost variant. I'll work on it next week on vacation... Sent from my iPad On May 24, 2011, at 4:48 PM, Ted Dunning wrote: > Is AdaBoost a scalable algorithm? > > It seems to me that it is inherently very sequential. > > On Tue, May 24, 2011 at 1:57 PM, Wojciech Indy

Re: Possible contributions

2011-05-23 Thread Hector Yee
t; you plan to write more than a few hundred lines of code, it would be good > to > file an individual contributor license. That can be found here: > > http://www.apache.org/licenses/icla.txt > > On Tue, May 17, 2011 at 10:17 PM, Hector Yee wrote: > > > I'll

Re: [jira] [Commented] (MAHOUT-703) Implement Gradient machine

2011-05-23 Thread Hector Yee
ut I'm not a lawyer. > > http://www.csie.ntu.edu.tw/~cjlin/libsvm/COPYRIGHT > > On Mon, May 23, 2011 at 5:38 PM, Hector Yee wrote: >> Libsvm has a few I can check in >> On May 22, 2011 4:36 PM, "Lance Norskog" wrote: >>> >>> What

Re: [jira] [Commented] (MAHOUT-703) Implement Gradient machine

2011-05-23 Thread Hector Yee
om a small dataset. > > On 5/21/11, Ted Dunning wrote: > > On Sat, May 21, 2011 at 4:25 PM, Hector Yee wrote: > > > >> Sure, or I can wait till you submit patches before working on the next > >> one? > >> > > > > I think that submit == com

Re: [jira] [Commented] (MAHOUT-703) Implement Gradient machine

2011-05-21 Thread Hector Yee
> On Sat, May 21, 2011 at 1:54 AM, Hector Yee (JIRA) > wrote: > > > > >[ > > > https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037

[jira] [Commented] (MAHOUT-703) Implement Gradient machine

2011-05-21 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037289#comment-13037289 ] Hector Yee commented on MAHOUT-703: --- Note: This patch requires 702 for

[jira] [Updated] (MAHOUT-703) Implement Gradient machine

2011-05-21 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated MAHOUT-703: -- Status: Patch Available (was: Open) Working ranking neural net with one hidden sigmoid layer

[jira] [Updated] (MAHOUT-703) Implement Gradient machine

2011-05-21 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated MAHOUT-703: -- Attachment: MAHOUT-703.patch Working ranking neural net, less the sparsity enforcing part. Would

Re: Classifier Interface

2011-05-20 Thread Hector Yee
score, only the result. > So there is at least one algorithm where whole sets of the classify > interface makes absolutely no sense. > > Daniel. > > On Fri, May 20, 2011 at 10:09 PM, Ted Dunning > wrote: > > On Fri, May 20, 2011 at 6:49 PM, Hector Yee > wrote: >

Re: Classifier Interface

2011-05-20 Thread Hector Yee
classifyScalarNoLink doesn't exist yet, it was a proposed addition to mirror the classifyNoLink (vector form), so it won't hurt any existing classifiers. 'score' was a proposed name for it. I was concerned about classifyScalar because it enforces the contract that the scores be in the 0..1 range.

Classifier Interface

2011-05-20 Thread Hector Yee
Hi, I noticed that classifier has three functions to call to get the score. classify - returns probabilities classifyNoLink - returns the raw score (optional) classifyScalar - returns the binary probability I'm working on a few classifiers for which it doesn't make sense to return probability.

Re: SF Informal meetup on May 23?

2011-05-19 Thread Hector Yee
I'm returning from a trip so a maybe depending on my arrival time. On Fri, May 20, 2011 at 4:58 AM, Ted Dunning wrote: > Great. As we get a count, I will make reservation. > > I am one, Grant two, Dawid three. Who else can make it that night? > > On Thu, May 19, 2011 at 1:13 PM, Grant Ingerso

[jira] [Commented] (MAHOUT-703) Implement Gradient machine

2011-05-19 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036636#comment-13036636 ] Hector Yee commented on MAHOUT-703: --- Sure its here: http://www.stanford.edu/c

Re: SF Informal meetup on May 23?

2011-05-18 Thread Hector Yee
know that's a bit north of Ted and Dmitry, I think. What about > others? > > I can scope out when I get there on Sunday. > > > > -Grant > > > > On May 18, 2011, at 6:37 PM, Ted Dunning wrote: > > > > > THursday is much better for me. > > >

[jira] [Commented] (MAHOUT-703) Implement Gradient machine

2011-05-18 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036022#comment-13036022 ] Hector Yee commented on MAHOUT-703: --- Yeah was planning to do L2 regularization f

[jira] [Created] (MAHOUT-703) Implement Gradient machine

2011-05-18 Thread Hector Yee (JIRA)
Reporter: Hector Yee Priority: Minor Implement a gradient machine (aka 'neural network) that can be used for classification or auto-encoding. It will just have an input layer, identity, sigmoid or tanh hidden layer and an output layer. Training done by stochastic gradient de

[jira] [Updated] (MAHOUT-702) Implement Online Passive Aggressive learner

2011-05-18 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated MAHOUT-702: -- Attachment: MAHOUT-702.patch - fixed bug in stomping instance - factored out online test case - added

[jira] [Commented] (MAHOUT-702) Implement Online Passive Aggressive learner

2011-05-18 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035905#comment-13035905 ] Hector Yee commented on MAHOUT-702: --- For #1 I thought I made a copy but I guess

Re: SF Informal meetup on May 23?

2011-05-18 Thread Hector Yee
I'm local too Monday or Thursday works. On Wed, May 18, 2011 at 3:38 AM, Dmitriy Lyubimov wrote: > I'd be interested to come. Being local, any day pretty much works for me. > > On Sun, May 15, 2011 at 2:42 AM, Dawid Weiss > wrote: > > Thursday would work for me too. > > > > Dawid > > > > On Sun

Re: Possible contributions

2011-05-18 Thread Hector Yee
ntributor license. That can be found here: > > http://www.apache.org/licenses/icla.txt > > On Tue, May 17, 2011 at 10:17 PM, Hector Yee wrote: > > > I'll probably just implement an in-core variant first. > > > > re: online kernelized ranker - this is pretty easy to do

[jira] [Updated] (MAHOUT-702) Implement Online Passive Aggressive learner

2011-05-18 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated MAHOUT-702: -- Status: Patch Available (was: Open) > Implement Online Passive Aggressive lear

[jira] [Updated] (MAHOUT-702) Implement Online Passive Aggressive learner

2011-05-18 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated MAHOUT-702: -- Attachment: MAHOUT-702.patch Implementation and unit test for passive aggressive. > Implement Onl

[jira] [Created] (MAHOUT-702) Implement Online Passive Aggressive learner

2011-05-18 Thread Hector Yee (JIRA)
Affects Versions: 0.6 Reporter: Hector Yee Priority: Minor Implements online passive aggressive learner that minimizes label ranking loss. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Possible contributions

2011-05-17 Thread Hector Yee
Re: boosting scalability, I've implemented it on thousands of machines, but not with mapreduce, rather with direct RPC calls. The gradient computation tends to be iterative, so one way to do it is to have each iteration run per mapreduce. Compute gradients in the mapper, gather them in the reducer,

Possible contributions

2011-05-17 Thread Hector Yee
Hello, Some background on myself - I was at Google the last 5 years working on the self-driving car, image search and youtube in machine learning ( http://www.linkedin.com/in/yeehector) I have some proposed contributions and I wonder if they will be useful in Mahout (otherwise I will just com