I'm not familiar with setOutputValueGroupingComparator
what about adding the doc# in the key and have your own
hashing/Partitioner?
so doing something like
cat_doc5-> 1
cat_doc1-> 1
cat_doc5-> 3
the hashing method would take everything before "_" as the hash.
the shuffling would still put t
Hadoop fellows,
Orange Labs is hosting a forum about Recommendation Engines (not limited to
video) in our South San Francisco lab.
We are now looking for more people interested in bringing their own experience
and perspective to the discussion.
I am sure there are interesting things to learn f
Apparently Yahoo has been taking video/audio of all the presentations in
the past HUG meetings.
Are they available somewhere?
I discovered AppNexus yesterday.
They offer hosting similar to Amazon EC2, with apparently more dedicated
hardware and a better notion of where things are in the datacenter.
Their web site says they are optimized for Hadoop applications.
Anybody tried and could give some feedback?
J.
I see the VM approach great for isolation, customized hadoop or tools
required by the jobs and ease of IT management.
Performance hit on CPU and IO is there but I never looked at the
numbers.
Anybody did?
Basically for now, on EC2 for instance, if you need to go faster, just
buy 50 more machines