Not that I know of, the block allocations are done by the NameNode for
several reasons/edge cases, and that is why the write path is as it is
today.
What problem are you trying to solve though, that led you to ask this?
If we have more info about the problem to solve, we can provide better
answers
Hello Harsh,
Does Hadoop-0.20.205.0(new API) has Avro support??
Regards,
Mohammad Tariq
On Wed, Jul 11, 2012 at 1:57 AM, Mohammad Tariq wrote:
> Hello Harsh,
>
> I am sorry to be a pest of questions. Actually I am kinda
> stuck. I have to write my MapReduce job such tha
Hi,
It is possible to write to a HDFS datanode w/o relying on Namenode, i.e. to
find the location of Datanodes from somewhere else ?
Thanks,
Robert
I am using MapReduce streaming with Python code. It works fine, for basic for
stdin and stdout.
But I have a mapper-only application that also emits some other output files.
So in addition to stdout, the program also creates files named output1.txt and
output2.txt. My code seems to be running c
Hi Subroto,
unfortunatelly that did not help...
still looking for a solution :-(
stederr logs for the container says in german: Fehler: Hauptklasse
org.apache.hadoop.mapreduce.v2.app.MRAppMaster konnte nicht gefunden oder
geladen werden
means org.apache.hadoop.mapreduce.v2.app.MRAppMaster cou
Thanks All!
On 11 Jul 2012 19:07, "Bejoy KS" wrote:
> **
> Hi Manoj
>
> Block size is in hdfs storage level where as split size is the amount of
> data processed by each mapper while running a map reduce job(One split is
> the data processed by one mapper). One or more hdfs blocks can contribute
Hi Andre,
Yups the problem got solved.
The problem I was facing was that JobClient code of my application was messing
the Hadoop Property:yarn.application.classpath.
After setting it to proper value now things work nice.
Current configuration looks something like this:
yarn.application.classpath=
Hi Manoj
Block size is in hdfs storage level where as split size is the amount of data
processed by each mapper while running a map reduce job(One split is the data
processed by one mapper). One or more hdfs blocks can contribute a split.
Splits are determined by the InputFormat as well as the
Which method do you refer to? I think DistributedCache.addLocalFiles()
only works for the files local to the task nodes? What I want is to
upload a file into the job specific directory of (HDFS) and register it
with DistributedCache (and maybe clear it up after the job finished). Is
there
Hi Subroto,
i have the same problem, can not get my mapreduce jobs to run...
The container log sais, that org.apache.hadoop.mapreduce.v2.app.MRAppMaster
can not be found... :-(
did you solve it already?
best regards
andre
- Original Message -
From: Subroto
Sent: Tue, 5 Jun 2012 1
Hi Tariq \Arun,
The no of blocks(splits) = *total no of file size/hdfs block size *
replicate value*
The no of splits is again nothing but the blocks here.
Other than increasing the block size(input splits) is it possible to limit
that no of mappers?
Cheers!
Manoj.
On Wed, Jul 11, 2012 at 6
Take a look at CombineFileInputFormat - this will create 'meta splits' which
include multiple small spilts, thus reducing #maps which are run.
Arun
On Jul 11, 2012, at 5:29 AM, Manoj Babu wrote:
> Hi,
>
> The no of mappers is depends on the no of blocks. Is it possible to limit the
> no of ma
Hello Manoj,
It is not the block that determines the no of mappers. It is
rather based on the no of input splits. No of mappers = no of input
splits.
And I did not get what do you mean by 'no of mapper size'. It is
possible to configure the input splits though. Hope it helps.
Regards,
Mo
Hi,
The no of mappers is depends on the no of blocks. Is it possible to limit
the no of mappers size without increasing the HDFS block size?
Thanks in advance.
Cheers!
Manoj.
14 matches
Mail list logo