Re: Can I number output results with a Counter?

2011-05-20 Thread Joey Echeverria
To make sure I understand you correctly, you need a globally unique one up counter for each output record? If you had an upper bound on the number of records a single reducer could output and you can afford to have gaps, you could just use the task id and multiply that by the max number of

Re: Can I number output results with a Counter?

2011-05-20 Thread Joey Echeverria
it. But what are counters for? They seem to be exactly that. Mark On Fri, May 20, 2011 at 12:01 PM, Joey Echeverria j...@cloudera.com wrote: To make sure I understand you correctly, you need a globally unique one up counter for each output record? If you had an upper bound on the number

Re: What's the easiest way to count the number of Key, Value pairs in a directory?

2011-05-20 Thread Joey Echeverria
Are you storing the data in sequence files? -Joey On Fri, May 20, 2011 at 10:33 AM, W.P. McNeill bill...@gmail.com wrote: The keys are Text and the values are large custom data structures serialized with Avro. I also have counters for the job that generates these files that gives me this

Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Joey Echeverria
Why do you need to move the script from $HADOOP_HOME/bin? Can't you just symlink it or write a script which runs the original? -Joey On May 19, 2011, at 4:15, Gabriele Kahlout gabri...@mysimpatico.com wrote: I'm still having the following problem, any suggestions? I'm trying to modify the

Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Joey Echeverria
, May 19, 2011 at 3:33 PM, Joey Echeverria j...@cloudera.com wrote: Why do you need to move the script from $HADOOP_HOME/bin? Can't you just symlink it or write a script which runs the original? -Joey On May 19, 2011, at 4:15, Gabriele Kahlout gabri...@mysimpatico.com wrote

Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Joey Echeverria
1041718 Compiled by hammer on Mon Dec 6 17:38:16 CET 2010 On Thu, May 19, 2011 at 4:55 PM, Joey Echeverria j...@cloudera.com wrote: What version of hadoop is installed? -Joey On May 19, 2011 7:49 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: I said i don't have write access

Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Joey Echeverria
hdfs. On Thu, May 19, 2011 at 5:02 PM, Joey Echeverria j...@cloudera.com wrote: Why do you need the hdfs script? Typically 0.20.x is used with just the hadoop script. -Joey On May 19, 2011 8:00 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: $ hadoop version Hadoop 0.20.3-SNAPSHOT

Re: REPOST: How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?

2011-05-19 Thread Joey Echeverria
? No changes to hdfs-config.sh? What about all the other stuff in the hdfs? For example the script calls hdfs dfs , like that won't it crash? elif [ $COMMAND = dfs ] ; then  CLASS=org.apache.hadoop.fs.FsShell On Thu, May 19, 2011 at 5:26 PM, Joey Echeverria j...@cloudera.com wrote: I would just

Re: Can I use InputSampler.RandomSampler on data with non-Text keys?

2011-05-19 Thread Joey Echeverria
Filing a bug is a great idea. InputSampler is in the MapReduce hadoop sub-project which has it's own Jira project: https://issues.apache.org/jira/browse/MAPREDUCE -Joey On Thu, May 19, 2011 at 9:28 AM, W.P. McNeill bill...@gmail.com wrote: Should I file a bug then?  Do I do that

Re: Reducer granularity and starvation

2011-05-18 Thread Joey Echeverria
The one advantage you would get with a large number of reducers is that the scheduler will be able to give open reduce slots to other jobs without having to be preemptive. This will reduce the risk of you losing a reducer 3 hours into a 4 hour run. -Joey On Wed, May 18, 2011 at 3:08 PM, James

Re: Can I use InputSampler.RandomSampler on data with non-Text keys?

2011-05-18 Thread Joey Echeverria
That sounds like a bug to me. I think the easiest way would be to modify InputSampler to handle non Text keys. -Joey On Wed, May 18, 2011 at 4:24 PM, W.P. McNeill bill...@gmail.com wrote: I want to do a total sort on some data whose key type is Writable but not Text.  I wrote an

Re: Are hadoop fs commands serial or parallel

2011-05-17 Thread Joey Echeverria
The sequence file writer definitely does it serially as you can only ever write to the end of a file in Hadoop. Doing copyFromLocal could write multiple files in parallel (I'm not sure if it does or not), but a single file would be written serially. -Joey On Tue, May 17, 2011 at 5:44 PM, Mapred

Re: mapper java process not exiting

2011-05-12 Thread Joey Echeverria
Which version of hadoop are you running? Are you running on linux? -Joey On Thu, May 12, 2011 at 1:39 PM, Adi adi.pan...@gmail.com wrote: For one long running job we are noticing that the mapper jvms do not exit even after the mapper is done. Any suggestions on why this could be happening.

Re: mapper java process not exiting

2011-05-12 Thread Joey Echeverria
Hadoop 0.21.0 with some patches. Hadoop 0.21.0 doesn't get much use, so I'm not sure how much help I can be. 2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process

Re: is it possible to concatenate output files under many reducers?

2011-05-12 Thread Joey Echeverria
You can control the number of reducers by calling job.setNumReduceTasks() before you launch it. -Joey On Thu, May 12, 2011 at 6:33 PM, Jun Young Kim juneng...@gmail.com wrote: yes. that is a general solution to control counts of output files. however, if you need to control counts of outputs

Re: Error while compiling the program

2011-04-25 Thread Joey Echeverria
Your delcaration of the Map class needs to include the input and output types, e.g.: public static class Map extends MapReduceBase implements MapperLongWritable, Text, Text, LongWritable { ... } -Joey On Mon, Apr 25, 2011 at 4:38 AM, praveenesh kumar praveen...@gmail.com wrote: Hi, I am

Re: How do I create a sequence file on my local harddrive?

2011-04-22 Thread Joey Echeverria
Did you try calling fs.setConf(configuration)? On Apr 22, 2011 9:09 PM, W.P. McNeill bill...@gmail.com wrote:

<    1   2