lot's of small jobs

2008-10-26 Thread Shirley Cohen
Hi, I have lot's of small jobs and would like to compute the aggregate running time of all the mappers and reducers in my job history rather than tally the numbers by hand through the web interface. I know that the Reporter object can be used to output performance numbers for a single job

Re: dfs i/o stats

2008-09-29 Thread Shirley Cohen
e.org/core/docs/current/api/org/apache/hadoop/ dfs/FSNamesystemMetrics.html Hope this is helpful. --Konstantin Shirley Cohen wrote: Hi, I would like to measure the disk i/o performance of our hadoop cluster. However, running iostat on 16 nodes is rather cumbersome. Does dfs keep track of any

Re: dfs i/o stats

2008-09-29 Thread Shirley Cohen
command to get it. Shirley Cohen wrote: Hi, I would like to measure the disk i/o performance of our hadoop cluster. However, running iostat on 16 nodes is rather cumbersome. Does dfs keep track of any stats like the number of blocks or bytes read and written? From scanning the api, I found a

dfs i/o stats

2008-09-29 Thread Shirley Cohen
Hi, I would like to measure the disk i/o performance of our hadoop cluster. However, running iostat on 16 nodes is rather cumbersome. Does dfs keep track of any stats like the number of blocks or bytes read and written? From scanning the api, I found a class called "org.apache.hadoop.fs.F

job details

2008-09-26 Thread Shirley Cohen
Hi, I'm trying to figure out which log files are used by the job tracker's web interface to display the following information: Job Name: my job Job File: hdfs://localhost:9000/tmp/hadoop-scohen/mapred/system/ job_200809260816_0001/job.xml Status: Succeeded Started at: Fri Sep 26 08:18:04 CD

Re: output multiple values?

2008-09-10 Thread Shirley Cohen
Thanks Owen! I found the bug in my code: Doing collect twice does work now :)) Shirley On Sep 9, 2008, at 4:19 PM, Owen O'Malley wrote: On Sep 9, 2008, at 12:20 PM, Shirley Cohen wrote: I have a simple reducer that computes the average by doing a sum/ count. But I want to output bot

output multiple values?

2008-09-09 Thread Shirley Cohen
I have a simple reducer that computes the average by doing a sum/ count. But I want to output both the average and the count for a given key, not just the average. Is it possible to output both values from the same invocation of the reducer? Or do I need two reducer invocations? If I try to

Re: no output from job run on cluster

2008-09-09 Thread Shirley Cohen
rley On Sep 7, 2008, at 8:38 AM, 叶双明 wrote: Are you sure there isn't any error or exception in logs? 2008/9/5, Shirley Cohen <[EMAIL PROTECTED]>: Hi Dmitry, Thanks for your suggestion. I checked and the other systems on the cluster do seem to have java installed. I was also able

Re: no output from job run on cluster

2008-09-04 Thread Shirley Cohen
utput from hadoop. If it help - can you submit bug request ? :) -Original Message- From: Shirley Cohen [mailto:[EMAIL PROTECTED] Sent: Thursday, September 04, 2008 10:07 AM To: core-user@hadoop.apache.org Subject: no output from job run on cluster Hi, I'm running on hadoop-0.18.0.

Re: Output directory already exists

2008-09-04 Thread Shirley Cohen
Thanks, Owen. This fixed my problem! Shirley On Sep 2, 2008, at 8:44 PM, Owen O'Malley wrote: On Tue, Sep 2, 2008 at 10:24 AM, Shirley Cohen <[EMAIL PROTECTED]> wrote: Hi, I'm trying to write the output of two different map-reduce jobs into the same output dire

no output from job run on cluster

2008-09-04 Thread Shirley Cohen
Hi, I'm running on hadoop-0.18.0. I have a m-r job that executes correctly in standalone mode. However, when run on a cluster, the same job produces zero output. It is very bizarre. I looked in the logs and couldn't find anything unusual. All I see are the usual deprecated filesystem name

Output directory already exists

2008-09-02 Thread Shirley Cohen
Hi, I'm trying to write the output of two different map-reduce jobs into the same output directory. I'm using MultipleOutputFormats to set the filename dynamically, so there is no filename collision between the two jobs. However, I'm getting the error "output directory already exists".

Re: MultipleOutputFormat versus MultipleOutputs

2008-08-29 Thread Shirley Cohen
Thanks, Benjamin. Your example saved me a lot of time :)) Shirley On Aug 28, 2008, at 8:03 AM, Benjamin Gufler wrote: Hi Shirley, On 2008-08-28 14:32, Shirley Cohen wrote: Do you have an example that shows how to use MultipleOutputFormat? using MultipleOutputFormat is actually pretty easy

Re: MultipleOutputFormat versus MultipleOutputs

2008-08-28 Thread Shirley Cohen
. With MultipleOutputFormat you can't. (and if I'm not mistaken) If using MultipleOutputFormat in a map you can't have a reduce phase. With MultipleOutputs you can. A On Thu, Aug 28, 2008 at 3:36 AM, Shirley Cohen <[EMAIL PROTECTED]> wrote: Hi, I would like the reducer t

MultipleOutputFormat versus MultipleOutputs

2008-08-27 Thread Shirley Cohen
Hi, I would like the reducer to output to different files based upon the value of the key. I understand that both MultipleOutputs and MultipleOutputFormat can do this. Is that correct? However, I don't understand the differences between these two classes. Can someone explain the differenc

distinct count

2008-08-26 Thread Shirley Cohen
Hi, What is the best way to do a distinct count in m-r? Is there any way of doing it with one reduce instead of two? Thanks, Shirley

Re: data partitioning question

2008-08-04 Thread Shirley Cohen
stage, because the partitioner only consider the size of blocks in bytes. Instead you can output the intermediate key/value pair as this: key: 1 if C=1,3,5,7. 0 otherwise value: the tuple. In reducer you can have a reducer deal with all the key with c=1,3,5,7. On Mon, Aug 4, 2008 at 3:29 PM, Sh

data partitioning question

2008-08-04 Thread Shirley Cohen
Hi, I want to implement some data partitioning logic where a mapper is assigned a specific range of values. Here is a concrete example of what I have in mind: Suppose I have attributes A, B, C and the following tuples: (A, B, C) (1, 3, 1) (1, 2, 2) (1, 2, 3) (12, 3, 4) (12, 2, 5) (12, 8, 6

Could not find any valid local directory for task

2008-08-03 Thread Shirley Cohen
Hi, Does anyone know what the following error means? hadoop-0.16.4/logs/userlogs/task_200808021906_0002_m_14_2]$ cat syslog 2008-08-02 20:28:00,443 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2008-08-02 20:28:00,684 INFO org.

No locks available error

2008-08-01 Thread Shirley Cohen
Hi, We're getting the following error when starting up hadoop on the cluster: 2008-08-01 14:42:37,334 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = node5.cube.disc.cias.ut

Re: iterative map-reduce

2008-07-29 Thread Shirley Cohen
write a iterative script. On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen <[EMAIL PROTECTED]> wrote: Hi, I want to call a map-reduce program recursively until some condition is met. How do I do that? Thanks, Shirley

iterative map-reduce

2008-07-29 Thread Shirley Cohen
Hi, I want to call a map-reduce program recursively until some condition is met. How do I do that? Thanks, Shirley

partitioning the inputs to the mapper

2008-07-27 Thread Shirley Cohen
How do I partition the inputs to the mapper, such that a mapper processes an entire file or files? What is happening now is that each mapper receives only portions of a file and I want them to receive an entire file. Is there a way to do that within the scope of the framework? Thanks, Sh

joins in map reduce

2008-05-21 Thread Shirley Cohen
Hi, How does one do a join operation in map reduce? Is there more than one way to do a join? Which way works better and why? Thanks, Shirley

Re: incremental re-execution

2008-04-21 Thread Shirley Cohen
n with something like Pig, where you have a good representation for internal optimizations, it is probably going to be difficult to convert the two MR steps into one pre-aggregation and two final aggregations. On 4/20/08 7:39 AM, "Shirley Cohen" <[EMAIL PROTECTED]> wrote:

Re: incremental re-execution

2008-04-20 Thread Shirley Cohen
n can be used to produce multiple low definition aggregates. I would find it very surprising if you could detect these sorts of situations. On 4/16/08 5:26 PM, "Shirley Cohen" <[EMAIL PROTECTED]> wrote: Dear Hadoop Users, I'm writing to find out what you

incremental re-execution

2008-04-16 Thread Shirley Cohen
Dear Hadoop Users, I'm writing to find out what you think about being able to incrementally re-execute a map reduce job. My understanding is that the current framework doesn't support it and I'd like to know whether, in your opinion, having this capability could help to speed up developme