Getting Job counters on Hadoop 0.20.2

2012-07-23 Thread Prajakta Kalmegh
Hi I am trying to retrieve job counters on hadoop 0.20.2 in runtime or using history using the org.apache.hadoop.mapred API. My program inputs a job_id using which I should return the job counters for a running job or by pulling the info from job history. I was able to simulate this on the newer

Counting records

2012-07-23 Thread Peter Marron
Hi, I am a complete noob with Hadoop and MapReduce and I have a question that is probably silly, but I still don't know the answer. For the purposes of discussion I'll assume that I'm using a standard TextInputFormat. (I don't think that this changes things too much.) To simplify (a fair bit)

RE: Datanode error

2012-07-23 Thread Pablo Musa
I am sorry, but I received an error when I sent the message to the list and all responses were sent to my junk mail. So I tried to send it again, and just then noticed your emails. Please do also share if you're seeing an issue that you think is related to these log messages. My datanodes do

Re: Counting records

2012-07-23 Thread Kai Voigt
Hi, an additional idea is to use the counter API inside the framework. http://diveintodata.org/2011/03/15/an-example-of-hadoop-mapreduce-counter/ has a good example. Kai Am 23.07.2012 um 16:25 schrieb Peter Marron: I am a complete noob with Hadoop and MapReduce and I have a question that

RE: Counting records

2012-07-23 Thread Dave Shine
You could just use a counter and never emit anything from the Map(). Use the getCounter(MyRecords, RecordTypeToCount).increment(1) whenever you find the type of record you are looking for. Never call output.collect(). Call the job with reduceTasks(0). When the job finishes, you can

Re: Datanode error

2012-07-23 Thread Harsh J
Pablo, Perhaps you've forgotten about it but you'd ask the same question last week and you did have some responses on it. Please see your earlier thread at http://search-hadoop.com/m/0BOOh17ugmD On Mon, Jul 23, 2012 at 7:27 PM, Pablo Musa pa...@psafe.com wrote: Hey guys, I have a cluster with

Re: Counting records

2012-07-23 Thread Michael Segel
Look at using a dynamic counter. You don't need to set up or declare an enum. The only caveat is that counters are passed back to the JT by each task and are stored in memory. On Jul 23, 2012, at 9:32 AM, Kai Voigt wrote:

RE: Datanode error

2012-07-23 Thread Pablo Musa
I am sorry, but I received an error when I sent the message to the list and all responses were sent to my junk mail. So I tried to send it again, and just then noticed your emails. Sorry!! -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: segunda-feira, 23 de julho de

Reducer MapFileOutpuFormat

2012-07-23 Thread Mike S
If I set my reducer output to map file output format and the job would say have 100 reducers, will the output generate 100 different index file (one for each reducer) or one index file for all the reducers (basically one index file per job)? If it is one index file per reducer, can rely on HDFS

AUTO: Yuan Jin is out of the office. (returning 07/25/2012)

2012-07-23 Thread Yuan Jin
I am out of the office until 07/25/2012. I am out of office. For HAMSTER related things, you can contact Jason(Deng Peng Zhou/China/IBM) For CFM related things, you can contact Daniel(Liang SH Su/China/Contr/IBM) For TMB related things, you can contact Flora(Jun Ying Li/China/IBM) For TWB

Re: AUTO: Yuan Jin is out of the office. (returning 07/25/2012)

2012-07-23 Thread Jean-Daniel Cryans
Fifth offense. Yuan Jin is out of the office. - I will be out of the office starting 06/22/2012 and will not return until 06/25/2012. I am out of Jun 21 Yuan Jin is out of the office. - I will be out of the office starting 04/13/2012 and will not return until 04/16/2012. I am out of

RE: Counting records

2012-07-23 Thread Peter Marron
Yeah, I thought about using counters but I was worried about what happens if a Mapper task fails. Does the counter get adjusted to remove any contributions that the failed Mapper made before another replacement Mapper is started? Otherwise in the case of any Mapper failure I'm going to get an

Re: AUTO: Yuan Jin is out of the office. (returning 07/25/2012)

2012-07-23 Thread Chen He
Just kick this junk mail guy out of the group. On Mon, Jul 23, 2012 at 5:22 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Fifth offense. Yuan Jin is out of the office. - I will be out of the office starting 06/22/2012 and will not return until 06/25/2012. I am out of Jun 21

Re: Counting records

2012-07-23 Thread Michael Segel
If the task fails the counter for that task is not used. So if you have speculative execution turned on and the JT kills a task, it won't affect your end results. Again the only major caveat is that the counters are in memory so if you have a lot of counters... On Jul 23, 2012, at 4:52

Re: AUTO: Yuan Jin is out of the office. (returning 07/25/2012)

2012-07-23 Thread Jason
Guys, just be nice On Tue, Jul 24, 2012 at 5:59 AM, Chen He airb...@gmail.com wrote: Just kick this junk mail guy out of the group. On Mon, Jul 23, 2012 at 5:22 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: Fifth offense. Yuan Jin is out of the office. - I will be out of the

int read(byte buf[], int off, int len) violates api level contract when length is 0 at the end of a stream

2012-07-23 Thread Jim Donofrio
api contract on java public int read(byte[] buffer[], int off, int len): If len is zero, then no bytes are read and 0 is returned; otherwise, there is an attempt to read at least one byte. If no byte is available because the stream is at end of file, the value -1 is returned; otherwise, at

Re: AUTO: Yuan Jin is out of the office. (returning 07/25/2012)

2012-07-23 Thread Chen He
Looks like that guy is your boss, Jason. It was you to let people forgive him last time. Tell him, remove the group mail-list from his auto email system. Looks like this Yuan has little contribution to the mail-list except for the spam auto emails. On Mon, Jul 23, 2012 at 6:12 PM, Jason

Re: AUTO: Yuan Jin is out of the office. (returning 07/25/2012)

2012-07-23 Thread Chen He
BTW, this is a Hadoop user group. You are welcomed to ask question and give solution to help people. Please do not pollute this technical environment. To Yuan Jin: DO NOT send me your auto email again to my personal mail-box. It is not fun but rude. We will still respect you if you do not send

problem configuring hadoop with s3 bucket

2012-07-23 Thread Alok Kumar
Hello Group, I've hadoop setup locally running. Now I want to use Amazon s3://mybucket as my data store, so i changed like dfs.data.dir=s3://mybucket/hadoop/ in my hdfs-site.xml, Is it the correct way? I'm getting error : WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory