Re: descentralized write operation in HDFS

2012-07-11 Thread Harsh J
Not that I know of, the block allocations are done by the NameNode for several reasons/edge cases, and that is why the write path is as it is today. What problem are you trying to solve though, that led you to ask this? If we have more info about the problem to solve, we can provide better answers

Re: WholeFileInputFormat format

2012-07-11 Thread Mohammad Tariq
Hello Harsh, Does Hadoop-0.20.205.0(new API) has Avro support?? Regards, Mohammad Tariq On Wed, Jul 11, 2012 at 1:57 AM, Mohammad Tariq wrote: > Hello Harsh, > > I am sorry to be a pest of questions. Actually I am kinda > stuck. I have to write my MapReduce job such tha

descentralized write operation in HDFS

2012-07-11 Thread Grandl Robert
Hi, It is possible to write to a HDFS datanode w/o relying on Namenode, i.e. to find the location of Datanodes from somewhere else ? Thanks, Robert

Extra output files from mapper ?

2012-07-11 Thread Connell, Chuck
I am using MapReduce streaming with Python code. It works fine, for basic for stdin and stdout. But I have a mapper-only application that also emits some other output files. So in addition to stdout, the program also creates files named output1.txt and output2.txt. My code seems to be running c

Re: java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/v2/app/MRAppMaste

2012-07-11 Thread Andreas Reiter
Hi Subroto, unfortunatelly that did not help... still looking for a solution :-( stederr logs for the container says in german: Fehler: Hauptklasse org.apache.hadoop.mapreduce.v2.app.MRAppMaster konnte nicht gefunden oder geladen werden means org.apache.hadoop.mapreduce.v2.app.MRAppMaster cou

Re: Mapper basic question

2012-07-11 Thread Manoj Babu
Thanks All! On 11 Jul 2012 19:07, "Bejoy KS" wrote: > ** > Hi Manoj > > Block size is in hdfs storage level where as split size is the amount of > data processed by each mapper while running a map reduce job(One split is > the data processed by one mapper). One or more hdfs blocks can contribute

Re: java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/v2/app/MRAppMaste

2012-07-11 Thread Subroto
Hi Andre, Yups the problem got solved. The problem I was facing was that JobClient code of my application was messing the Hadoop Property:yarn.application.classpath. After setting it to proper value now things work nice. Current configuration looks something like this: yarn.application.classpath=

Re: Mapper basic question

2012-07-11 Thread Bejoy KS
Hi Manoj Block size is in hdfs storage level where as split size is the amount of data processed by each mapper while running a map reduce job(One split is the data processed by one mapper). One or more hdfs blocks can contribute a split. Splits are determined by the InputFormat as well as the

Re: equivalent of "-file" option in the programmatic call, (access jobID before submit())

2012-07-11 Thread GUOJUN Zhu
Which method do you refer to? I think DistributedCache.addLocalFiles() only works for the files local to the task nodes? What I want is to upload a file into the job specific directory of (HDFS) and register it with DistributedCache (and maybe clear it up after the job finished). Is there

Re: Re: java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/v2/app/MRAppMaste

2012-07-11 Thread Andreas Reiter
Hi Subroto, i have the same problem, can not get my mapreduce jobs to run... The container log sais, that org.apache.hadoop.mapreduce.v2.app.MRAppMaster can not be found... :-( did you solve it already? best regards andre - Original Message - From: Subroto Sent: Tue, 5 Jun 2012 1

Re: Mapper basic question

2012-07-11 Thread Manoj Babu
Hi Tariq \Arun, The no of blocks(splits) = *total no of file size/hdfs block size * replicate value* The no of splits is again nothing but the blocks here. Other than increasing the block size(input splits) is it possible to limit that no of mappers? Cheers! Manoj. On Wed, Jul 11, 2012 at 6

Re: Mapper basic question

2012-07-11 Thread Arun C Murthy
Take a look at CombineFileInputFormat - this will create 'meta splits' which include multiple small spilts, thus reducing #maps which are run. Arun On Jul 11, 2012, at 5:29 AM, Manoj Babu wrote: > Hi, > > The no of mappers is depends on the no of blocks. Is it possible to limit the > no of ma

Re: Mapper basic question

2012-07-11 Thread Mohammad Tariq
Hello Manoj, It is not the block that determines the no of mappers. It is rather based on the no of input splits. No of mappers = no of input splits. And I did not get what do you mean by 'no of mapper size'. It is possible to configure the input splits though. Hope it helps. Regards, Mo

Mapper basic question

2012-07-11 Thread Manoj Babu
Hi, The no of mappers is depends on the no of blocks. Is it possible to limit the no of mappers size without increasing the HDFS block size? Thanks in advance. Cheers! Manoj.