Fwd: Bulk Loading DFS Space issue in Hbase

2013-01-23 Thread Vikas Jadhav
-- Forwarded message -- From: Vikas Jadhav vikascjadha...@gmail.com Date: Tue, Jan 22, 2013 at 5:23 PM Subject: Bulk Loading DFS Space issue in Hbase To: u...@hbase.apache.org Hi I am trying to bulk load 700m CSV data with 31 colms into Hbase I have written MapReduce Program for

Spilled records

2013-01-23 Thread Ajay Srivastava
Hi, I was tuning mapred job to reduce number of spills and reached a stage where following numbers are same - Spilled Records in map = Spilled records in reduce = Combine output Records = Reduce Input Records I do not see any lines in mapper logs with following strings - 1. Spilling map

EOF when Combiner works

2013-01-23 Thread s2323
Hi! When I run job with this options: -Dmapred.map.child.java.opts=-Xmx2048M -Dio.sort.mb=1424 -Dio.sort.record.percent=0.08 all tasks fails on combiner step with: ... 2013-01-23 12:20:28,143 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 1424 2013-01-23 12:23:03,772 INFO

Re: Understanding harpoon - help needed

2013-01-23 Thread Harsh J
Iterating on Bharath's responses, my answers to each of your questions inline: On Wed, Jan 23, 2013 at 2:54 PM, Dibyendu Karmakar dibyendu.d...@gmail.comwrote: Hi, I am doing some performance testing in HADOOP. But while testing, I faced a situation. I need your help. My HADOOP cluster :

Re: EOF when Combiner works

2013-01-23 Thread Harsh J
Can you add in what exactly is your combiner logic performing? On Wed, Jan 23, 2013 at 3:09 PM, s2323 s2...@land.ru wrote: Hi! When I run job with this options: -Dmapred.map.child.java.opts=-Xmx2048M -Dio.sort.mb=1424 -Dio.sort.record.percent=0.08 all tasks fails on combiner step

Re: NameNode low on available disk space

2013-01-23 Thread Mohit Vadhera
Can somebody answer me on this plz ? On Wed, Jan 23, 2013 at 11:44 AM, Mohit Vadhera project.linux.p...@gmail.com wrote: Thanks Guys, As you said the level is already pretty low i.e 100 MB but in my case the root fs / has 14 G available. What can be the root cause then ?

Re: NameNode low on available disk space

2013-01-23 Thread Harsh J
Mohit, When do you specifically get the error at the NN? Does your NN consistently not start with that error? Your local disk space availability can certainly fluctuate if you use the same disk for MR and other activity which creates temporary files. On Wed, Jan 23, 2013 at 9:01 PM, Mohit

Hadoop Nutch Mkdirs failed to create file

2013-01-23 Thread 吴靖
hi, everyone! I want use the nutch to crawl the web pages, but problem comes as the log like, I think it maybe some permissions problem,but i am not sure. Any help will be appreciated, think you 2013-01-23 07:37:21,809 ERROR mapred.FileOutputCommitter - Mkdirs failed to create

Re: NameNode low on available disk space

2013-01-23 Thread Mohit Vadhera
NN switches randomly into the safemode then I run command to leave safemode manually. I never got alerts for low disk space on machine level and i didn't see the space fluctuates GBs into MBs . On Wed, Jan 23, 2013 at 9:10 PM, Harsh J ha...@cloudera.com wrote: Mohit, When do you

Re: NameNode low on available disk space

2013-01-23 Thread Harsh J
A random switching behavior can only be explained by a fluctuating disk space I'd think. Are you running MR operations on the same disk (i.e. is it part of mapred.local.dir as well)? On Wed, Jan 23, 2013 at 9:24 PM, Mohit Vadhera project.linux.p...@gmail.com wrote: NN switches randomly into

Re: Hadoop Nutch Mkdirs failed to create file

2013-01-23 Thread Harsh J
What version of Hadoop are you using, and is your use of the local (non-cluster) job runner mode intentional? On Wed, Jan 23, 2013 at 9:23 PM, 吴靖 qhwj2...@126.com wrote: hi, everyone! I want use the nutch to crawl the web pages, but problem comes as the log like, I think it maybe some

Re: NameNode low on available disk space

2013-01-23 Thread Mohit Vadhera
MR operation are running on the same machine. i checked the parameter mapred.local.dir in my installed directory /etc/hadoop/ but didn't find . One question the disk space reserved size displayed in logs in KB or MB ? I am layman on hadoop. The link I followed to install is given below

Re: NameNode low on available disk space

2013-01-23 Thread Harsh J
The logs display it in simple bytes. If the issue begins to occur when you start using Hadoop, then its most certainly MR using up the disk space temporarily. You could lower the threshold, or you could perhaps use a bigger disk for your trials/more nodes. On Wed, Jan 23, 2013 at 10:25 PM,

Re: NameNode low on available disk space

2013-01-23 Thread Harsh J
On Wed, Jan 23, 2013 at 10:37 PM, Mohit Vadhera project.linux.p...@gmail.com wrote: 51200 51200 *bytes* is 50 KB. 50 MB is 50*1024*1024, which is 52428800. You can verify changes to config by visiting the http://NNHOST:50070/conf page and searching for the config key name to see if the NN has

Re: ISSUE with Hadoop JobTracker Web UI under CDH4

2013-01-23 Thread Arun C Murthy
Also, pls stop the cc. Tx. On Jan 23, 2013, at 9:06 AM, Harsh J wrote: Again, moving to cdh-u...@cloudera.org. Please use the https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user forum for CDH-specific issues as this list is for help with Apache Hadoop

Trouble starting up Task Tracker

2013-01-23 Thread Corbett Martin
Question We're trying out Cloudera Manager and CDH4 in a Clustered deployment and having trouble getting the Task Trackers to start up. The error says (full stacktrace below) 2013-01-23 10:48:37,443 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because

Re: Trouble starting up Task Tracker

2013-01-23 Thread Harsh J
This is the problem: drwx-- 4 hdfs hdfs 4096 Jan 15 16:37 .. Your /data/1 directory seems to be owned by hdfs and restricted only to it (700). I'm not sure this is necessary and you can perhaps make it 755 at least. Or perhaps what you may have is a misconfig wherein you've set your DN

Hadoop-1.1.0 on EC2

2013-01-23 Thread Daniel Buchan
Hi, We've recently built a hadoop package that we're somewhat happy with which we'd now like to deploy on Amazon's EC2. However we built against the HADOOP release 1.1.0 and there doesn't appear to be a public AMI image for hadoop 1.1.0. Will we have to build our own AMI or is there another

Hadoop-1.1.0 on EC2

2013-01-23 Thread Daniel Buchan
Hi, We've recently built a hadoop package that we're somewhat happy with which we'd now like to deploy on Amazon's EC2. However we built against the HADOOP release 1.1.0 and there doesn't appear to be a public AMI image for hadoop 1.1.0. Will we have to build our own AMI or is there another

Re: Hadoop-1.1.0 on EC2

2013-01-23 Thread Harsh J
Pardon if my use of AMZN jargon is wrong here cause I don't quite use it much: I don't think we carry/maintain an AMI. However, there's the Apache Whirr project that deals with Hadoop over Cloud and you can probably take a look/ask there: http://whirr.apache.org? On Wed, Jan 23, 2013 at 11:55 PM,

Need help with cluster setup for performance [Impala]

2013-01-23 Thread Steven Wong
My apologies for sending this message to this group, but I'm having trouble sending to the right group. From: Steven Wong Sent: Wednesday, January 23, 2013 11:15 AM To: impala-u...@cloudera.org Subject: RE: Need help with cluster setup for performance Thanks

Re: Spring for hadoop

2013-01-23 Thread Panshul Whisper
Hello Radim, Your solution sounds interesting. Is it possible for me to try the solution before I buy it? Thnx , Regards On Wed, Jan 23, 2013 at 1:07 AM, Radim Kolar h...@filez.com wrote: i have solution integrating spring beans and spring batch directly into hadoop core. its far more

Error after upgrading from CDH3 to CDH4

2013-01-23 Thread Nataraj Rashmi - rnatar
Hi, I am getting this error when I run the java application that uses HDFS API to transfer files to HDFS remotely. This used to work fine with CDH3 and now we are using CDH4. Exception in thread main java.io.IOException: No FileSystem for scheme: hdfs at

Re:Re: Hadoop Nutch Mkdirs failed to create file

2013-01-23 Thread 吴靖
my hadoop version is hadoop-1.1.1, and it run in the local mode! At 2013-01-24 00:43:53,Harsh J ha...@cloudera.com wrote: What version of Hadoop are you using, and is your use of the local (non-cluster) job runner mode intentional? On Wed, Jan 23, 2013 at 9:23 PM, 吴靖 qhwj2...@126.com

hdfs du periodicity and hdfs not respond at that time

2013-01-23 Thread Xibin Liu
hi all, I found hdfs du periodicity(one hour), and because my disk is big, the smallest one is 15T, so when hdfs exec du, datanode will not respond for about 3 minuts because of io loading, this cause a lot of problem, anybody knows why hdfs doing this and how to disable it? -- Thanks Regards

Re: MulitpleOutputs outputs just one line

2013-01-23 Thread Harsh J
Hi Barak, As instructed on http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html, do you also make sure to call the mos.close() function at the end of Mapper (in its cleanup stage)? On Thu, Jan 24, 2013 at 12:40 PM, Barak Yaish barak.ya...@gmail.com

Re: MulitpleOutputs outputs just one line

2013-01-23 Thread Barak Yaish
Yes, I'm calling mos.close() at the Mapper.cleanup(). Are there some logs that I can turn on to troubleshoot this issue? On Thu, Jan 24, 2013 at 9:36 AM, Harsh J ha...@cloudera.com wrote: Hi Barak, As instructed on

Re: hdfs du periodicity and hdfs not respond at that time

2013-01-23 Thread Harsh J
Hi, HDFS does this to estimate space reports. Perhaps the discussion here may help you: http://search-hadoop.com/m/LLBgUiH0Bg2 On Thu, Jan 24, 2013 at 12:51 PM, Xibin Liu xibin.liu...@gmail.com wrote: hi all, I found hdfs du periodicity(one hour), and because my disk is big, the smallest one

Re: hdfs du periodicity and hdfs not respond at that time

2013-01-23 Thread Xibin Liu
Thanks, http://search-hadoop.com/m/LLBgUiH0Bg2 is my issue , but I still dont't know how to solve this problem, 3 minutes not respond once an hour is a big problem for me, any clue for this? 2013/1/24 Harsh J ha...@cloudera.com Hi, HDFS does this to estimate space reports. Perhaps the