No. of records processed by each Reducer

2010-09-22 Thread Nikhil Sawant
Hi, Is there any way to get number of records processed by each reducer, without using hadoop counters? Thanks -Nikhil

Re: Strange output in modified TeraSort

2010-09-22 Thread Steve Loughran
On 21/09/10 12:13, Matthew John wrote: Hi all , I am working on a Sort function which takes in records of 40 bytes ( 8 bytes longwritable key and 32 bytes bytes byteswritable key ) and sorts them and output them. For this I have got a modified Terasort working (thanks to Jeff !) . Since t

Re: jobtracker: Cannot assign requested address

2010-09-22 Thread Steve Loughran
On 21/09/10 08:17, Jing Tie wrote: I am still suffering from the problem... Did anyone encounter it before? Or any suggestions? Have you considered configuring the JT with a hostname other than "hostname", but instead with what the real machine's name is? Many thanks in advance! Jing On

Re: NullPointerException on getBlockLocations

2010-09-22 Thread Medha Atre
I could not understand much from the code snippet, so here are few quick questions to help you. Is the file in question is a logical (normal disk) file or a DFS file? If logical, do you have the same file in the same path on all the nodes? If not, you should either have it replicated in same path o

return a parameter using Map only

2010-09-22 Thread Shi Yu
Dear Hadoopers, I am stuck at a probably very simple problem but can't figure it out. In the Hadoop Map/Reduce framework, I want to search a huge file (which is generated by another Reduce task) for a unique line of record (a value actually). That record is expected to be passed to another f

Re: Questions about BN and CN

2010-09-22 Thread Konstantin Shvachko
The CheckpointNode creates checkpoints of the namespace, but does not keep an up-to-date state of the namespace in memory. If primary NN fails CheckpointNode can only provide an old state of the namespace created during latest checkpoint. Also CheckpointNode is a replacement for SecondaryNameNode

Re: How to disable secondary node

2010-09-22 Thread Konstantin Shvachko
If a CheckpointNode or a BackupNode is used then the SecondaryNameNode should be disabled. --Konstantin On 9/9/2010 7:59 AM, Edward Capriolo wrote: It is a bad idea to permanently disable 2nn. The edits file grows very very large and will not be processed until the name node restart. We had a 1

Re: return a parameter using Map only

2010-09-22 Thread Steve Lewis
what distinguishes this record and will every mapper know it? It sounds like all you need to do is ignore non-matching records and then run other code in the mapper - I am assuming across all mappers the code only runs once. On Wed, Sep 22, 2010 at 2:06 PM, Shi Yu wrote: > Dear Hadoopers, > > I

Re: return a parameter using Map only

2010-09-22 Thread Shi Yu
It is not restrict to unique in the real problem, it can be relaxed to a few number of different hits. Maybe I over simplified the problem in the previous email. But lets consider the unique pattern first. The problem is as follows, to search the splits of a give string (which comes from a para

clienttrace messages in datanode log

2010-09-22 Thread Sharma, Avani
How can I prevent the following messages from creating huge datanode logs ? 2010-09-22 00:00:00,030 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.110.210.11:50010, dest: /10.110.210.11:34714, bytes: 132096, op: HDFS_READ, cliID: DFSClient_-5111 26949, srvID: DS-139

Re: Questions about BN and CN

2010-09-22 Thread ChingShen
Thanks Konstantin, But, my main question is that because the CN can only provide an old state of the namespace, so why do we need it? I think the BN is best solution. Shen On Thu, Sep 23, 2010 at 5:20 AM, Konstantin Shvachko wrote: > The CheckpointNode creates checkpoints of the namespace, bu

Shuffle tasks getting killed

2010-09-22 Thread aniket ray
Hi, I continuously run a series of batch job using Hadoop Map Reduce. I also have a managing daemon that moves data around on the hdfs making way for more jobs to be run. I use capacity scheduler to schedule many jobs in parallel. I see an issue on the Hadoop web monitoring UI at port 50030 which