On 02/13/2012 06:02 PM, jem85 wrote:
I was wondering if anyone has had any experience with porting cuda code to
hadoop pipes. Any assistance would be greatly appreciated.
Thanks,
Probably not exactly what you're looking for, but we've ported other C
code to Hadoop. We used Pydoop (http://py
We just update the slides of this improvements:
http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
Updates:
(1) modified some describes to make things more clear and accuracy.
(2) add some benchmarks to make sense.
On Sat, Feb 18, 2012 at 11:12 PM, Anty wrote:
Hi,
I used to do
job.setNumReduceTasks(1);
but I realized that this is bad and commented out this line
//job.setNumReduceTasks(1);
I still see the number of reduce tasks as 1 when my mappers number 4. Why
could this be?
Thank you,
Mark
The default value for "mapred.reduce.tasks" is indeed "1".
For your cluster, you should tune your client configuration set to
carry a suitable number for that property, in mapred-site.xml
(http://wiki.apache.org/hadoop/HowManyMapsAndReduces might help you
decide how many), or pass it along as a "-
Indeed, worked like a charm.
Some strange thing that I see is that the more reducers I run, the more
data I get on the output. However, my suspicion is that since I use some
global counters in my reducers, it could be that when it is called the
second time, it overwrites the first results. Oh, wel