Hadoop YARN 2.2.0 Streaming Memory Limitation

2014-02-24 Thread Patrick Boenzli
hello hadoop-users! We are currently facing a frustrating hadoop streaming memory problem. our setup: our compute nodes have about 7 GB of RAM hadoop streaming starts a bash script wich uses about 4 GB of RAM therefore it is only possible to start one and only one task per node out of the box

job failed on hadoop 2

2014-02-24 Thread AnilKumar B
Hi, When I try to run MapReduce job on Hadoop 2, I am facing below issue. What could be the problem? 14/02/24 02:24:05 INFO mapreduce.Job: Job job_1392973982912_14477 running in uber mode : false 14/02/24 02:24:05 INFO mapreduce.Job: map 0% reduce 0% 14/02/24 02:24:14 INFO mapreduce.Job: map

RE: job failed on hadoop 2

2014-02-24 Thread Vinayakumar B
Hi Anil, I think multiple clients/tasks are trying to write to same file with overwrite enabled Second client is overwriting the first client's file, and first client is getting the below mentioned exception. Please check .. Regards, Vinayakumar B From: AnilKumar B

Re: heap space error

2014-02-24 Thread Dieter De Witte
You can configure the heap size of the mappers with the following parameter (in mapred.site.xml) mapred.map.child.java.opts=-Xmx3200m Also setting the nummber of map tasks is not useful. You should set the number of map slots per node: mapred.tasktracker.map.tasks.maximum=6 Regards, Dieter

Re: heap space error

2014-02-24 Thread Raj hadoop
Thanks a ton Dieter On Mon, Feb 24, 2014 at 3:45 PM, Dieter De Witte drdwi...@gmail.com wrote: You can configure the heap size of the mappers with the following parameter (in mapred.site.xml) mapred.map.child.java.opts=-Xmx3200m Also setting the nummber of map tasks is not useful. You

query

2014-02-24 Thread Banty Sharma
hello !! i want to get a information about hadoop development..from where i can get actual procedure to solve the issues..

Re: heap space error

2014-02-24 Thread Dieter De Witte
No problem, it's not easy to learn about all hadoop's configuration options. Definitely consider looking into the reference (Tom White) 2014-02-24 11:20 GMT+01:00 Raj hadoop raj.had...@gmail.com: Thanks a ton Dieter On Mon, Feb 24, 2014 at 3:45 PM, Dieter De Witte drdwi...@gmail.comwrote:

Re: job failed on hadoop 2

2014-02-24 Thread AnilKumar B
Thanks Vinay. I am checking my code, but this exception is coming after map 100%. That's why I am not getting where could be the issue. 14/02/24 02:24:05 INFO mapreduce.Job: Job job_1392973982912_14477 running in uber mode : false 14/02/24 02:24:05 INFO mapreduce.Job: map 0% reduce 0% 14/02/24

Re: query

2014-02-24 Thread Richard Chen
your question was rather vaguely described. but you may check below links out: - http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment - http://vichargrave.com/create-a-hadoop-build-and-development-environment-for-hadoop/ -

RE: job failed on hadoop 2

2014-02-24 Thread Vinayakumar B
Hi Anil, I think avro output emitted in reducers is written to same file from different tasks? Because I am pretty sure that this problem will come only this case. Because previous writer is fenced by new writer. To findout, 1. Enable hdfs-audit logs for namenode ( if not done ) 2.

Wrong FS hdfs:/localhost:9000 ;expected file///

2014-02-24 Thread Chirag Dewan
Hi All, I am new to hadoop. I am using hadoop 2.2.0. I have a simple client code which reads a file from HDFS on a single node cluster. Now when I run my code using java -jar mytest.jar it throws the error Wrong FS hdfs://localhost. When I run the same code with hadoop jar test.jar it works

Re: job failed on hadoop 2

2014-02-24 Thread AnilKumar B
Hi Vinay, Actually when I use multiple outputs with AvroKeyOutputFormat, then I am facing that issue, I just removed multiple outputs and used general context.write(), then it's working now. I need to debug this issue. May some issue in my code. Thanks for your inputs, I will debug with your

Performance

2014-02-24 Thread Thomas Bentsen
Hi everyone I am still beginning Hadoop. Is there any benchmarks or 'performance heuristics' for Hadoop? Is it possible to say something like 'You can process X lines of GZipped log file on a medium AWS server in Y minutes? I would like to get an idea of what kind of workflow is possible.

Re: Performance

2014-02-24 Thread Dieter De Witte
Hi, The terasort benchmark is probably the most common. It has mappers and reducers doing 'nothing', this way you only use the framework's mergesort functionalities. Regards, Dieter 2014-02-24 16:42 GMT+01:00 Thomas Bentsen t...@bentzn.com: Hi everyone I am still beginning Hadoop. Is

Re: Hadoop YARN 2.2.0 Streaming Memory Limitation

2014-02-24 Thread Anfernee Xu
Can you try setting yarn.nodemanager.resource.memory-mb(Amount of physical memory, in MB, that can be allocated for containers), say 1024, and also set mapreduce.map.memory.mb to 1024? On Mon, Feb 24, 2014 at 1:27 AM, Patrick Boenzli patrick.boen...@soom-it.ch wrote: hello hadoop-users! We

Re: Hiveserver2 + OpenLdap Authentication issue

2014-02-24 Thread Vinod Kumar Vavilapalli
This is on the wrong mailing list, hence the non-activity. +user@hive bcc:user@hadoop Thanks +Vinod On Feb 23, 2014, at 10:16 PM, orahad bigdata oracle...@gmail.com wrote: Can somebody help me please? Thanks On Sun, Feb 23, 2014 at 3:27 AM, orahad bigdata oracle...@gmail.com wrote:

Re: Performance

2014-02-24 Thread Thomas Bentsen
Thanks Dieter! I'll look into it. Still... It would be nice to hear something from the real world. Would any of you working with Hadoop in a prod env be willing to share something? /th On Mon, 2014-02-24 at 16:56 +0100, Dieter De Witte wrote: Hi, The terasort benchmark is probably the

hadoop 2.2.0 cluster setup error : could only be replicated to 0 nodes instead of minReplication (=1)

2014-02-24 Thread Manoj Khangaonkar
Hi, I setup a cluster with machine1 : namenode and datanode machine 2 : data node A simple hdfs copy is not working. Can someone help with this issue ? Several folks have posted this error on the web, But I have seen a good reason or solution. command: bin/hadoop fs -copyFromLocal ~/hello

Re: Hadoop YARN 2.2.0 Streaming Memory Limitation

2014-02-24 Thread Arun C Murthy
Can you pls try with mapreduce.map.memory.mb = 5124 mapreduce.map.child.java.opts=-Xmx1024 ? This way the map jvm gets 1024 and 4G is available for the container. Hope that helps. Arun On Feb 24, 2014, at 1:27 AM, Patrick Boenzli patrick.boen...@soom-it.ch wrote: hello hadoop-users! We

Re: Capacity Scheduler capacity vs. maximum-capacity

2014-02-24 Thread Arun C Murthy
Yes, it's a recent addition to CapacityScheduler. It's available in hadoop-2.2.0 onwards. See https://issues.apache.org/jira/browse/YARN-569. Arun On Feb 21, 2014, at 8:28 AM, ricky l rickylee0...@gmail.com wrote: Does Hadoop capacity scheduler support preemption in this scenario? Based on

Re: meaning or usage of reserved containers in YARN Capacity scheduler

2014-02-24 Thread Arun C Murthy
Apologies for the late reply. The concept of reservation is used to prevent starvation. For e.g. let's say you have 2 machines with 8G each. Now each of those are running containers which take up 6G on nodeA and 7G on nodeB. Another application comes in and then asks for a single container of

Re: meaning or usage of reserved containers in YARN Capacity scheduler

2014-02-24 Thread ricky l
Thanks, Arun. The scenario makes sense!! thx. On Mon, Feb 24, 2014 at 5:56 PM, Arun C Murthy a...@hortonworks.com wrote: Apologies for the late reply. The concept of reservation is used to prevent starvation. For e.g. let's say you have 2 machines with 8G each. Now each of those are running

Re: Capacity Scheduler capacity vs. maximum-capacity

2014-02-24 Thread ricky l
Thank Arun for the update. That's very helpful - I was going to switch to fair scheduler for the preemption purpose. On Mon, Feb 24, 2014 at 5:51 PM, Arun C Murthy a...@hortonworks.com wrote: Yes, it's a recent addition to CapacityScheduler. It's available in hadoop-2.2.0 onwards. See

HDFS Client write data is slow

2014-02-24 Thread lei liu
I use Hbase-0.94 and hadoop-2.0. I install one HDFS cluster that have 15 datanodes. If network bandwidth of two datanodes is saturation(example 100m/s), writing performance of the entire hdfs cluster is slow. I think that the slow datanodes affect the writing performance of the entire cluster.

Re: hadoop 2.2.0 cluster setup error : could only be replicated to 0 nodes instead of minReplication (=1)

2014-02-24 Thread Manoj Khangaonkar
Hi Can one of the implementors comment on what conditions trigger this error ? All the data nodes show up as commissioned. No errors during startup If I google for this error, there are several posts reporting the issue : but most of the answers have weak solutions like reformating and

Re: hadoop 2.2.0 cluster setup error : could only be replicated to 0 nodes instead of minReplication (=1)

2014-02-24 Thread Azuryy Yu
Generally, this is caused by insufficient space. please check the total capacity of your cluster and used, remaining ratio, and check dfs.datanode.du.reserved in the hdfs-site.xml if this value is larger than your remained capacity, then you got this Exception. On Tue, Feb 25, 2014 at 10:35

Mappers vs. Map tasks

2014-02-24 Thread Sugandha Naolekar
Hello, As per the various articles I went through till date, the File(s) are split in chunks/blocks. On the same note, would like to ask few things: 1. No. of mappers are decided as: Total_File_Size/Max. Block Size. Thus, if the file is smaller than the block size, only one mapper will be

Re: Mappers vs. Map tasks

2014-02-24 Thread Sugandha Naolekar
One more thing to ask: No. of blocks = no. of mappers. Thus, those many no. of times the map() function will be called right? -- Thanks Regards, Sugandha Naolekar On Tue, Feb 25, 2014 at 11:27 AM, Sugandha Naolekar sugandha@gmail.comwrote: Hello, As per the various articles I went

Re: Mappers vs. Map tasks

2014-02-24 Thread shashwat shriparv
You are really confused :) Please read this : http://developer.yahoo.com/hadoop/tutorial/module4.html#closer http://wiki.apache.org/hadoop/HowManyMapsAndReduces * Warm Regards_**∞_* * Shashwat Shriparv* [image:

Reading a file in a customized way

2014-02-24 Thread Sugandha Naolekar
Hello, Irrespective of the file blocks placed in HDFS, I want my map() to be called/invoked in a customized manner. For. eg. I want to process a huge JSON File(single file). Now this file is definitely less than the default block size(128 MB). Thus, ideally, only one mapper will be called. Means,

Re: Reading a file in a customized way

2014-02-24 Thread sudhakara st
Use WholeInputFileFormat/WholeFileRecordReader ( The Hadoop Definitive Guide- Tom White's page 240) to read the file name as the key and the contents of the file as its value to mapper. Before getting into this, better read HDFS Architecture and Map reduce flow

Re: Mappers vs. Map tasks

2014-02-24 Thread Dieter De Witte
Each node has a tasktracker with a number of map slots. A map slot hosts as mapper. A mapper executes map tasks. If there are more map tasks than slots obviously there will be multiple rounds of mapping. The map function is called once for each input record. A block is typically 64MB and can