virtual memory consumption

2014-09-11 Thread Jakub Stransky
Hello hadoop users, I am facing following issue when running M/R job during a reduce phase: Container [pid=22961,containerID=container_1409834588043_0080_01_10] is running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory

Re: Writing output from streaming task without dealing with key/value

2014-09-11 Thread Dmitry Sivachenko
After streaming job outputs some data to stdout, some hadoop code receives it and splits into key/value pair before it reaches TextOutputFormat. Can anyone point me to that piece of code please? Thanks! On 11 сент. 2014 г., at 0:37, Dmitry Sivachenko trtrmi...@gmail.com wrote: On 10 сент.

Re: S3 with Hadoop 2.5.0 - Not working

2014-09-11 Thread Dhiraj
Hi Harsh, I am a newbie to hadoop. I am able to start the nodes with hadoop 1.1.x and hadoop 1.2.x versions for the following property; but not with 2.5.0(fs.defaultFS) for 1.*.* releases i dont need to specify hdfs:// like you suggested; it works with s3:// property

Re: virtual memory consumption

2014-09-11 Thread Susheel Kumar Gadalay
Your physical memory is 1GB on this node. What are the other containers (map tasks) running on this? You have given map memory as 768M and reduce memory as 1024M and am as 1024M. With AM and a single map task it is 1.7M and cannot start another container for reducer. Reduce these values and

RE: Error when executing a WordCount Program

2014-09-11 Thread YIMEN YIMGA Gael
Hello dear all, Regarding the issue below, I succeded to fix the following warning 14/09/10 15:00:24 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). But the main error is still persisting. 14/09/10 15:00:24 ERROR

Re: virtual memory consumption

2014-09-11 Thread Jakub Stransky
Hi, thanks for reply. Machine is pretty small as it has 4GB of total memory. So we reserved 1GB for OS, 1GB HBase (according to recommendation) so remains 2GB thats what nodemanager claims. Actually it is a cluster of 5machines, 2 name-nodes and 3 data nodes. All machines has similar parameters

Re: virtual memory consumption

2014-09-11 Thread Tsuyoshi OZAWA
Hi Jakub, You have 2 options: 1. Turning off virtual memory check as you mentioned. 2. Making yarn.nodemanager.vmem-pmem-ratio larger. 1. is reasonable choice if you cannot predict virtual memory usage in advance or you don't have any applications to check virtual memory. Thanks, - Tsuyoshi

[no subject]

2014-09-11 Thread Sheena O'Connell
Hi I have a python script that throws an obvious error NameError: name 'asooasdhoasdhio' is not defined and I'm using that script as both the mapper and reducer for a streaming task. /usr/local/hadoop/bin/hadoop jar

Change Default S3 domain/server.

2014-09-11 Thread Dhiraj
Hi, is it possible to change the default S3 domain for my bucket from s3.amazonsws.com to something local. I have a local object store server running on my machine and i want the requests to be directed to my server instead of amazon. eg: when i do the following # hadoop fs -ls s3://bucket1/

Hadoop Smoke Test

2014-09-11 Thread arthur.hk.c...@gmail.com
Hi, I am trying the smoke test for hadoop, “terasort”, during the Map phase, I found “Container killed by the ApplicationMast”, should I stop this job and try to run it again? or just let it continue? 14/09/11 21:27:53 INFO mapreduce.Job: map 22% reduce 0% 14/09/11 21:31:33 INFO

Re: Writing output from streaming task without dealing with key/value

2014-09-11 Thread Dmitry Sivachenko
Okay, FWIW I found the solution: https://issues.apache.org/jira/browse/MAPREDUCE-6085 Thanks for all who replied. On 11 сент. 2014 г., at 11:16, Dmitry Sivachenko trtrmi...@gmail.com wrote: After streaming job outputs some data to stdout, some hadoop code receives it and splits into

task slowness

2014-09-11 Thread Jakub Stransky
Hello experienced hadoop users, I am having a data pipeline consisting of two java MR jobs coordinated by oozie scheduler. Both of them process the same data but the first one is more than 10 times slower than second one. Job counters on RM page are not much helpful in that matter. I have

Enable Debug logging for a job

2014-09-11 Thread Siddhi Mehta
I am using hadoop2.3.0 with MR2. I tried enablibg debugging logging for a map/reduce tasks by setting to mapreduce.map.log.level and mapreduce.reduce.log.level to DEBUG in mapred-site.xnl Am i missing something or is this a known issue. Aprreciate your response. Thanks, Siddhi

Change master node with Ambari?

2014-09-11 Thread Charles Robertson
I'm playing around with a 2-node cluster running Hortonworks Data Platform 2.1, running on AWS EC2 instances. Today I discovered that for some reason my master node had decided to no longer recognise my private key, so I could no longer SSH in to the host to work on the project. In this case it's

RE: Change master node with Ambari?

2014-09-11 Thread Liu, Yi A
1. If your cluster is still alive, then you can build a new cluster and using distcp to migrate all data to the new cluster. 2. Suppose the master node you mean is the NameNode, the second approach needs you to copy fsimage/edit logs from master node, I think it doesn’t work for

MultipleTextOutputFormat in new api of 1.2.1?

2014-09-11 Thread Li Li
I want to output different key ranges to different directory. As of old api, there is a MultipleTextOutputFormat. I just need rewrite generateFileNameForKeyValue. But I can't find it in new api. There is a MultipleOutputs. But it's not that good because it need predefine keys by