Re: Streaming jar creates only 1 reducer

2011-10-21 Thread Nick Jones
FWIW, I usually specify the number of reducers in both streaming and against the Java API. The "default" is what's read from your config files on the submitting node. Nick Jones On Oct 21, 2011, at 5:00 PM, Mapred Learn wrote: > Hi, > Does streaming jar create 1 reducer

Re: easiest way to install hadoop

2011-02-22 Thread Nick Jones
I found Cloudera's distribution easy to use, but it's the only thing I tried. Nick On Tue, Feb 22, 2011 at 9:42 PM, real great.. wrote: > Hi, > Very trivial question. > Which is the easiest way to install hadoop? > i mean which distribution should i go for?? apache or cloudera? > n which is th

Re: Help on streaming jobs

2010-08-28 Thread Nick Jones
The number of refugees is normally the number of output files desired as well. Forcing a large job to output to one or a few files can make a job take a very long time. Nick Jones Sent by radiation. On Aug 27, 2010, at 11:52 PM, Xin Feng wrote: > Did you mean that i should include: >

Re: Hadoop with Eclipse on Windows 7

2010-07-21 Thread Nick Jones
Hi, It's true that Linux is a more well supported platform but Windows with cygwin does work. Nick Jones Sent by radiation. On Jul 21, 2010, at 5:28 PM, Khaled BEN BAHRI wrote: Hi :) Windows is not well test yet as a production platform, GNU/Linux is better than windows for using hado

Re: Problem with DBOutputFormat

2010-06-08 Thread Nick Jones
Hi Giridhar, Can you share your code somewhere? Nick Jones Sent by radiation. On Jun 8, 2010, at 7:21 AM, "Giridhar Addepalli" > wrote: Hi Sonal, I am using Hadoop 0.20.2. Is this okay ? Thanks for the suggestion , will look at the hiho framework. Thanks, Giridhar. -Ori

Re: Memory intensive jobs and JVM reuse

2010-04-29 Thread Nick Jones
e this: http://hadoop.apache.org/common/docs/current/mapred-default.html HTH, DR Couldn't the DistributedCache idea still work with a chained set of jobs? Map the first set into files on the DFS and add them to the DC for the next time through? Nick Jones

Re: Hadoop over the internet

2010-04-17 Thread Nick Jones
I think the biggest issue would be upstream bandwidth and latency. If the thought was to use a Seti type approach, most users wouldn't have the necessary upstream bandwidth to support the DFS. It would be likely that a few local desktop machines would significantly out pace a much larger DSL/cabl