Re: Hadoop-MapReduce

2013-12-17 Thread Shekhar Sharma
Hello Ranjini, This error will come when you use mix and match newer and older API.. You might have written program using newer API and the the XML input format is using older api.. The older api has package structure of org.apache.hadoop.mapred The newer api has package structure package of

Re: Hadoop-MapReduce

2013-12-17 Thread Ranjini Rathinam
Hi, The driver class and my Mapper class i have used org.apache.hadoop.mapreduce.lib and in the XmlInputFormat.java class also i have used the org.apache.hadoop.mapreduce.lib but still iam getting this error. Please suggest. Thanks in advance Ranjini On Tue, Dec 17, 2013 at 2:07 PM, Shekhar

Re: Hadoop-MapReduce

2013-12-17 Thread Ranjini Rathinam
Hi, I want to know , when should i use Mapper , Reduce and Combiner. What all methods are there in them. Please suggest for study in detail. As I am fresher . Thanks in advance Ranjini On Tue, Dec 17, 2013 at 2:34 PM, unmesha sreeveni unmeshab...@gmail.comwrote: Ranjini can u pls check

Re: Yarn -- one of the daemons getting killed

2013-12-17 Thread Krishna Kishore Bonagiri
Hi Jeff, I have run the resource manager in the foreground without nohup and here are the messages when it was killed, it says it is Killed but doesn't say why! 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application appattempt_1387266015651_0258_01 released container

Re: issue when using HDFS

2013-12-17 Thread Geelong Yao
Hi I check the jps command on namenode: 8426 ResourceManager 23861 Jps 23356 SecondaryNameNode 23029 NameNode datanode: 25104 NodeManager 25408 Jps Obviously the datanode was not working,after I format the HDFS with hadoop namenode -format,the problem remains still. The latest log form the

XmlInputFormat Hadoop -Mapreduce

2013-12-17 Thread Ranjini Rathinam
Hi, I am trying to process xml via mapreduce. and output should be in text format. I am using hadoop 0.20 the following error has occured , the link provided https://github.com/studhadoop/xmlparsing-hadoop/blob/master/XmlParser11.java I have used the Package org.apache.hadoop.mapreduce.lib.

Re: pipes on hadoop 2.2.0 crashes

2013-12-17 Thread Silvina Caíno Lores
I'm having similar problems with pipes, mostly because of issues with the native shared libraries that leave the job stuck either at 0%-0% or before launch (because the resource manager gets stuck as well and crashes). I found that out by looking at the stderr logs by the way. Let us know if you

Re: Estimating the time of my hadoop jobs

2013-12-17 Thread Azuryy Yu
Hi Kandoi, It depends on: how many cores on each VNode how complicated of your analysis application But I don't think it's normal spent 3hr to process 30GB data even on your *not good* hareware. On Tue, Dec 17, 2013 at 6:39 PM, Kandoi, Nikhil nikhil.kan...@emc.comwrote: Hello everyone,

RE: Estimating the time of my hadoop jobs

2013-12-17 Thread Kandoi, Nikhil
I know this foolish of me to ask this, because there are a lot of factors that affect this, but why is it taking so much time, can anyone suggest possible reasons for it, or if anyone has faced such issue before Thanks, Nikhil Kandoi P.S - I am Hadoop-1.0.3 for this application, so I wonder

Why other process can't see the change after calling hdfsHFlush unless hdfsCloseFile is called?

2013-12-17 Thread Xiaobin She
hi, I'm using libhdfs to deal with hdfs in an c++ programme. And I have encountered an problem. here is the scenario : 1. first I call hdfsOpenFile with O_WRONLY flag to open an file 2. call hdfsWrite to write some data 3. call hdfsHFlush to flush the data, according to the header hdfs.h,

How to add a new node to a secure cluster without namenode/jobtracker restart?

2013-12-17 Thread Rainer Toebbicke
Hello, How do you add a new datanode to a secure cluster, without restarting the namenode? In order to prevent identity theft of mapred or hdfs, a secure cluster needs to carefully maintain auth_to_local in core-site.xml as far as I understand, typically with lines such as

Re: Estimating the time of my hadoop jobs

2013-12-17 Thread Devin Suiter RDX
Nikhil, One of the problems you run into with Hadoop in Virtual Machine environments is performance issues when they are all running on the same physical host. With a VM, even though you are giving them 4 GB of RAM, and a virtual CPU and disk, if the virtual machines are sharing physical

RE: HDFS short-circuit reads

2013-12-17 Thread John Lilley
Thanks! I do call FileSytem.getFileBlockLocations() now to map tasks to local data blocks; is there any advantage to using listLocatedStatus() instead? I guess one call instead of two... John From: Chris Nauroth [mailto:cnaur...@hortonworks.com] Sent: Monday, December 16, 2013 6:07 PM To:

Re: XmlInputFormat Hadoop -Mapreduce

2013-12-17 Thread Shekhar Sharma
Hello Ranjini, PFA the source code for XML Input Format. Also find the output and the input which i have used. ATTACHED FILES DESRIPTION: (1) emp.xml ---Input Data for testing (2)emp_op.tar.zg--Output. Results of the map only job ( I have set the number of reducer=0) (3)src.tar-- the source

Hadoop 2.2.0 Documentation wrong?

2013-12-17 Thread Hiran Chaudhuri
Hello there. I downloaded Hadoop 2.2.0 and installed it over two servers. Upload of files into the HDFS worked via a small Java client that I wrote. Now I try to run a mapreduce job, and for this I am following this document:

Re: Estimating the time of my hadoop jobs

2013-12-17 Thread Shekhar Sharma
Apart from what Devin has suggested there are other factors which could be worth while noting when you are running your hadoop cluster on virtual machines. (1) How many map and reduce slots are there in cluster? Since you have not mentioned and you are using 4 node hadoop cluster so total of

Re: pipes on hadoop 2.2.0 crashes

2013-12-17 Thread Mauro Del Rio
Ok, I had some problems with configuration and host resolution and I fixed them. I was able to run successfully the simple wordcount example, but I failed running wordcount-nopipe.cc. This is the stack trace: Error: java.io.IOException: pipe child exception at

System ulimit for hadoop jobtracker node

2013-12-17 Thread Viswanathan J
Hi, What value(ulimit) will be fair enough for jobtracker node. If it's too high will that cause thread block or any issue in jobtracker. Please help.

Re: Yarn -- one of the daemons getting killed

2013-12-17 Thread Vinod Kumar Vavilapalli
That's good info. It is more than likely that it is the OOM killer. See http://stackoverflow.com/questions/726690/who-killed-my-process-and-why for example. Thanks, +Vinod On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Jeff, I have run the

Re: System ulimit for hadoop jobtracker node

2013-12-17 Thread rtejac
Depends on various factors like number of users, number of tasks like size of MR jobs etc.. 32K-64K is a safe bet. 128K is a bad number, but shouldn't harm. On Dec 17, 2013, at 9:10 AM, Viswanathan J jayamviswanat...@gmail.com wrote: Hi, What value(ulimit) will be fair enough for

DistributedCache.addArchiveToClassPath doesn't seem to work

2013-12-17 Thread John Conwell
I've got a tar.gz file that has many 3rd party jars in it that my MR job requires. This tar.gz file is located on hdfs. When configuring my MR job, I call DistributedCache.addArchiveToClassPath(), passing in the hdfs path to the tar.gz file. When the Mapper executes I get a

Permission problem in Junit test - Hadoop 2.2.0

2013-12-17 Thread Karim Awara
Hi, I am running Junit test on hadoop 2.2.0 on eclipse on mac os x. Whenever I run the test, I am faced with the following error It seems there is a problem with the permission test data dir. Please advise. 2013-12-18 02:09:19,326 ERROR hdfs.MiniDFSCluster

Re: Permission problem in Junit test - Hadoop 2.2.0

2013-12-17 Thread Ted Yu
Have you set umask to 022 ? See https://issues.apache.org/jira/browse/HDFS-2556 Cheers On Tue, Dec 17, 2013 at 3:12 PM, Karim Awara karim.aw...@kaust.edu.sawrote: Hi, I am running Junit test on hadoop 2.2.0 on eclipse on mac os x. Whenever I run the test, I am faced with the following

Re: HDFS short-circuit reads

2013-12-17 Thread Chris Nauroth
Both of these methods return the same underlying data type that you're ultimately interested in. This is the BlockLocation object, which contains the hosts that have a replica of the block. Depending on your usage pattern, one of these methods might be more convenient than the other. If your

Re: Permission problem in Junit test - Hadoop 2.2.0

2013-12-17 Thread Karim Awara
Yes. Nothing yet. I should mention I compiled hadoop 2.2 from the src using maven on a single machine (mac os x). It seems whatever I do in the permissions, the error persists. -- Best Regards, Karim Ahmed Awara On Wed, Dec 18, 2013 at 2:24 AM, Ted Yu yuzhih...@gmail.com wrote: Have you set

Re: Permission problem in Junit test - Hadoop 2.2.0

2013-12-17 Thread Karim Awara
What is weird is that, it has the right permissions in the directories below, but for some reason it got deleted at the build/test/data/dfs/data to be -- permission access -- Best Regards, Karim Ahmed Awara On Wed, Dec 18, 2013 at 2:35 AM, Karim Awara karim.aw...@kaust.edu.sawrote: Yes.

Re: Permission problem in Junit test - Hadoop 2.2.0

2013-12-17 Thread Andre Kelpe
You have to start eclipse from an environment that has the correct umask set, otherwise it will not inherit the settings. Open a terminal, do umask 022 eclipse and re-try to run the tests. - André On Wed, Dec 18, 2013 at 12:35 AM, Karim Awara karim.aw...@kaust.edu.sawrote: Yes. Nothing

Unsubscribe

2013-12-17 Thread jun guo

Re: System ulimit for hadoop jobtracker node

2013-12-17 Thread Viswanathan J
Hi, I have the ulimit of 99, should be a problem? If yes what prob will arise? On Dec 18, 2013 12:05 AM, rtejac rte...@gmail.com wrote: Depends on various factors like number of users, number of tasks like size of MR jobs etc.. 32K-64K is a safe bet. 128K is a bad number, but shouldn't

Re: System ulimit for hadoop jobtracker node

2013-12-17 Thread Tao Xiao
set the following contents in /etc/security/limits.d/90-nproc.conf hadoop-user soft nproc 32000 hadoop-user hardnproc 32000 hadoop-user soft nofile 65535 hadoop-user hardnofile 65535 2013/12/18 Viswanathan J

HDFS Data Integrity in copyToLocal

2013-12-17 Thread Korb, Michael [USA]
Hello, How can I verify the integrity of files copied to local from HDFS? Does HDFS store MD5s of full files anywhere? From what I can find, FileSystem.getFileChecksum() is relevant to replication and not comparison across filesystems

Re: Permission problem in Junit test - Hadoop 2.2.0

2013-12-17 Thread Karim Awara
Still the same problem. IF you notice, The unit test actually created the directories upto $HOME_HDFS/build/test/data/dfs without problems at all. I think that because MiniDFSCluster is emulating a cluster of one namenode and two datanodes, it tries to create dir for th datanodes and this is