RE: Grouping Values for Reducer Input

2009-04-13 Thread jeremy.huylebroeck
I'm not familiar with setOutputValueGroupingComparator what about adding the doc# in the key and have your own hashing/Partitioner? so doing something like cat_doc5-> 1 cat_doc1-> 1 cat_doc5-> 3 the hashing method would take everything before "_" as the hash. the shuffling would still put t

Orange Labs is hosting an event about recommendation engines - March 3rd

2009-02-25 Thread jeremy.huylebroeck
Hadoop fellows, Orange Labs is hosting a forum about Recommendation Engines (not limited to video) in our South San Francisco lab. We are now looking for more people interested in bringing their own experience and perspective to the discussion. I am sure there are interesting things to learn f

Videos and slides of the HUG meetings?

2008-10-16 Thread jeremy.huylebroeck
Apparently Yahoo has been taking video/audio of all the presentations in the past HUG meetings. Are they available somewhere?

Anybody used AppNexus for hosting Hadoop app?

2008-07-24 Thread jeremy.huylebroeck
I discovered AppNexus yesterday. They offer hosting similar to Amazon EC2, with apparently more dedicated hardware and a better notion of where things are in the datacenter. Their web site says they are optimized for Hadoop applications. Anybody tried and could give some feedback? J.

RE: Hadoop Distributed Virtualisation

2008-06-06 Thread jeremy.huylebroeck
I see the VM approach great for isolation, customized hadoop or tools required by the jobs and ease of IT management. Performance hit on CPU and IO is there but I never looked at the numbers. Anybody did? Basically for now, on EC2 for instance, if you need to go faster, just buy 50 more machines