hello hadoop-users!
We are currently facing a frustrating hadoop streaming memory problem. our
setup:
our compute nodes have about 7 GB of RAM
hadoop streaming starts a bash script wich uses about 4 GB of RAM
therefore it is only possible to start one and only one task per node
out of the box
Hi,
When I try to run MapReduce job on Hadoop 2, I am facing below issue.
What could be the problem?
14/02/24 02:24:05 INFO mapreduce.Job: Job job_1392973982912_14477 running
in uber mode : false
14/02/24 02:24:05 INFO mapreduce.Job: map 0% reduce 0%
14/02/24 02:24:14 INFO mapreduce.Job: map
Hi Anil,
I think multiple clients/tasks are trying to write to same file with overwrite
enabled
Second client is overwriting the first client's file, and first client is
getting the below mentioned exception.
Please check ..
Regards,
Vinayakumar B
From: AnilKumar B
You can configure the heap size of the mappers with the following parameter
(in mapred.site.xml)
mapred.map.child.java.opts=-Xmx3200m
Also setting the nummber of map tasks is not useful. You should set the
number of map slots per node:
mapred.tasktracker.map.tasks.maximum=6
Regards,
Dieter
Thanks a ton Dieter
On Mon, Feb 24, 2014 at 3:45 PM, Dieter De Witte drdwi...@gmail.com wrote:
You can configure the heap size of the mappers with the following
parameter (in mapred.site.xml)
mapred.map.child.java.opts=-Xmx3200m
Also setting the nummber of map tasks is not useful. You
hello !! i want to get a information about hadoop development..from where i
can get actual procedure to solve the issues..
No problem, it's not easy to learn about all hadoop's configuration
options. Definitely consider looking into the reference (Tom White)
2014-02-24 11:20 GMT+01:00 Raj hadoop raj.had...@gmail.com:
Thanks a ton Dieter
On Mon, Feb 24, 2014 at 3:45 PM, Dieter De Witte drdwi...@gmail.comwrote:
Thanks Vinay.
I am checking my code, but this exception is coming after map 100%. That's
why I am not getting where could be the issue.
14/02/24 02:24:05 INFO mapreduce.Job: Job job_1392973982912_14477 running
in uber mode : false
14/02/24 02:24:05 INFO mapreduce.Job: map 0% reduce 0%
14/02/24
your question was rather vaguely described. but you may check below links
out:
- http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment
-
http://vichargrave.com/create-a-hadoop-build-and-development-environment-for-hadoop/
-
Hi Anil,
I think avro output emitted in reducers is written to same file from different
tasks?
Because I am pretty sure that this problem will come only this case. Because
previous writer is fenced by new writer.
To findout,
1. Enable hdfs-audit logs for namenode ( if not done )
2.
Hi All,
I am new to hadoop. I am using hadoop 2.2.0. I have a simple client code which
reads a file from HDFS on a single node cluster. Now when I run my code using
java -jar mytest.jar it throws the error Wrong FS hdfs://localhost.
When I run the same code with hadoop jar test.jar it works
Hi Vinay,
Actually when I use multiple outputs with AvroKeyOutputFormat, then I am
facing that issue, I just removed multiple outputs and used general
context.write(), then it's working now.
I need to debug this issue. May some issue in my code.
Thanks for your inputs, I will debug with your
Hi everyone
I am still beginning Hadoop.
Is there any benchmarks or 'performance heuristics' for Hadoop?
Is it possible to say something like 'You can process X lines of GZipped
log file on a medium AWS server in Y minutes? I would like to get an
idea of what kind of workflow is possible.
Hi,
The terasort benchmark is probably the most common. It has mappers and
reducers doing 'nothing', this way you only use the framework's mergesort
functionalities.
Regards, Dieter
2014-02-24 16:42 GMT+01:00 Thomas Bentsen t...@bentzn.com:
Hi everyone
I am still beginning Hadoop.
Is
Can you try setting yarn.nodemanager.resource.memory-mb(Amount of physical
memory, in MB, that can be allocated for containers), say 1024, and also
set mapreduce.map.memory.mb to 1024?
On Mon, Feb 24, 2014 at 1:27 AM, Patrick Boenzli patrick.boen...@soom-it.ch
wrote:
hello hadoop-users!
We
This is on the wrong mailing list, hence the non-activity.
+user@hive
bcc:user@hadoop
Thanks
+Vinod
On Feb 23, 2014, at 10:16 PM, orahad bigdata oracle...@gmail.com wrote:
Can somebody help me please?
Thanks
On Sun, Feb 23, 2014 at 3:27 AM, orahad bigdata oracle...@gmail.com wrote:
Thanks Dieter!
I'll look into it.
Still... It would be nice to hear something from the real world. Would
any of you working with Hadoop in a prod env be willing to share
something?
/th
On Mon, 2014-02-24 at 16:56 +0100, Dieter De Witte wrote:
Hi,
The terasort benchmark is probably the
Hi,
I setup a cluster with
machine1 : namenode and datanode
machine 2 : data node
A simple hdfs copy is not working. Can someone help with this issue ?
Several folks have posted this error on the web, But I have seen a good
reason or solution.
command:
bin/hadoop fs -copyFromLocal ~/hello
Can you pls try with mapreduce.map.memory.mb = 5124
mapreduce.map.child.java.opts=-Xmx1024 ?
This way the map jvm gets 1024 and 4G is available for the container.
Hope that helps.
Arun
On Feb 24, 2014, at 1:27 AM, Patrick Boenzli patrick.boen...@soom-it.ch wrote:
hello hadoop-users!
We
Yes, it's a recent addition to CapacityScheduler.
It's available in hadoop-2.2.0 onwards. See
https://issues.apache.org/jira/browse/YARN-569.
Arun
On Feb 21, 2014, at 8:28 AM, ricky l rickylee0...@gmail.com wrote:
Does Hadoop capacity scheduler support preemption in this scenario?
Based on
Apologies for the late reply.
The concept of reservation is used to prevent starvation.
For e.g. let's say you have 2 machines with 8G each. Now each of those are
running containers which take up 6G on nodeA and 7G on nodeB.
Another application comes in and then asks for a single container of
Thanks, Arun. The scenario makes sense!! thx.
On Mon, Feb 24, 2014 at 5:56 PM, Arun C Murthy a...@hortonworks.com wrote:
Apologies for the late reply.
The concept of reservation is used to prevent starvation.
For e.g. let's say you have 2 machines with 8G each. Now each of those are
running
Thank Arun for the update. That's very helpful - I was going to switch
to fair scheduler for the preemption purpose.
On Mon, Feb 24, 2014 at 5:51 PM, Arun C Murthy a...@hortonworks.com wrote:
Yes, it's a recent addition to CapacityScheduler.
It's available in hadoop-2.2.0 onwards. See
I use Hbase-0.94 and hadoop-2.0.
I install one HDFS cluster that have 15 datanodes. If network bandwidth of
two datanodes is saturation(example 100m/s), writing performance of the entire
hdfs cluster is slow.
I think that the slow datanodes affect the writing performance of the entire
cluster.
Hi
Can one of the implementors comment on what conditions trigger this error ?
All the data nodes show up as commissioned. No errors during startup
If I google for this error, there are several posts reporting the issue :
but most of the answers have weak solutions like reformating and
Generally, this is caused by insufficient space.
please check the total capacity of your cluster and used, remaining ratio,
and check dfs.datanode.du.reserved in the hdfs-site.xml
if this value is larger than your remained capacity, then you got this
Exception.
On Tue, Feb 25, 2014 at 10:35
Hello,
As per the various articles I went through till date, the File(s) are split
in chunks/blocks. On the same note, would like to ask few things:
1. No. of mappers are decided as: Total_File_Size/Max. Block Size. Thus,
if the file is smaller than the block size, only one mapper will be
One more thing to ask: No. of blocks = no. of mappers. Thus, those many no.
of times the map() function will be called right?
--
Thanks Regards,
Sugandha Naolekar
On Tue, Feb 25, 2014 at 11:27 AM, Sugandha Naolekar
sugandha@gmail.comwrote:
Hello,
As per the various articles I went
You are really confused :) Please read this :
http://developer.yahoo.com/hadoop/tutorial/module4.html#closer
http://wiki.apache.org/hadoop/HowManyMapsAndReduces
* Warm Regards_**∞_*
* Shashwat Shriparv*
[image:
Hello,
Irrespective of the file blocks placed in HDFS, I want my map() to be
called/invoked in a customized manner. For. eg. I want to process a huge
JSON File(single file). Now this file is definitely less than the default
block size(128 MB). Thus, ideally, only one mapper will be called. Means,
Use WholeInputFileFormat/WholeFileRecordReader ( The Hadoop Definitive
Guide- Tom White's page 240) to read the file name as the key and the
contents of the file as its value to mapper.
Before getting into this, better read HDFS Architecture and Map reduce
flow
Each node has a tasktracker with a number of map slots. A map slot hosts as
mapper. A mapper executes map tasks. If there are more map tasks than slots
obviously there will be multiple rounds of mapping.
The map function is called once for each input record. A block is typically
64MB and can
32 matches
Mail list logo