Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

2013-08-13 Thread lei liu
The fsimage file size is 1658934155 2013/8/13 Harsh J ha...@cloudera.com How large are your checkpointed fsimage files? On Mon, Aug 12, 2013 at 3:42 PM, lei liu liulei...@gmail.com wrote: When Standby Namenode is doing checkpoint, upload the image file to Active NameNode, the Active

Re: Jobtracker page hangs ..again.

2013-08-13 Thread Patai Sangbutsarakum
Thanks harsh, Appreciate your input, as always. On Aug 12, 2013, at 20:01, Harsh J ha...@cloudera.com wrote: If you're not already doing it, run a local name caching daemon (such as ncsd, etc.) on each cluster node. Hadoop does a lot of lookups and a local cache would do good in reducing the

Re: Exceptions in Name node and Data node logs

2013-08-13 Thread Vimal Jain
Sorry for not giving version details I am using Hadoop version - 1.1.2 and Hbase version - 0.94.7 On Tue, Aug 13, 2013 at 1:53 PM, Vimal Jain vkj...@gmail.com wrote: Hi, I have configured Hadoop and Hbase in pseudo distributed mode. So far things were working fine , but suddenly i started

Re: Exceptions in Name node and Data node logs

2013-08-13 Thread Vimal Jain
Along with these exceptions i am seeing some exceptions in hbase logs too. Here it is : *Exception in Master log :* 2013-07-31 15:51:04,694 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1266874891120ms instead of 1ms, this is likely due to a long garbage c ollecting pause and it's

Re: Exceptions in Name node and Data node logs

2013-08-13 Thread Jitendra Yadav
Hi, One of your DN is marked as dead because NN is not able to get heartbeat message from DN but NN still getting block information from dead node. This error is similar to a bug *HDFS-1250* reported 2 years back and fixed in 0.20 release. Can you please check the status of DN's in cluster.

Re: Exceptions in Name node and Data node logs

2013-08-13 Thread Vimal Jain
Hi Jitendra, Thanks for your reply. Currently my hadoop/hbase is down in production as it had filled up the disk space with above exceptions in log files and had to be brought down. Also i am using hadoop/hbase in pseudo distributed mode , so there is only one node which hosts all 6 processes (

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

2013-08-13 Thread lei liu
I write one programm to test NameNode performance. Please see the EditLogPerformance.java I use 60 threads to execute the EditLogPerformance.javacode, the testing result is below content: 2013-08-13 17:43:01,479 INFO my.EditLogPerformance (EditLogPerformance.java:run(37)) - totalCount:10392810

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

2013-08-13 Thread Harsh J
Perhaps turning on fsimage compression may help. See documentation of dfs.image.compress at http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml. You can also try to throttle the bandwidth it uses via dfs.image.transfer.bandwidthPerSec. On Tue, Aug 13, 2013 at

Maven Cloudera Configuration problem

2013-08-13 Thread Pavan Sudheendra
Hi, I'm currently using maven to build the jars necessary for my map-reduce program to run and it works for a single node cluster.. For a multi node cluster, how do i specify my map-reduce program to ingest the cluster settings instead of localhost settings? I don't know how to specify this using

Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.

2013-08-13 Thread Jitendra Yadav
Hi, I'm agreed with Harsh comment on image file compression and transfer bandwidth parameter for optimizing checkpoint process. In addition I'm not able to correlate your performance program log timings(less then 10) and file transfer logs timing on active/stand by nodes. Thanks On Tue, Aug 13,

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Shahab Yunus
You need to configure your namenode and jobtracker information in the configuration files within you application. Only set the relevant properties in the copy of the files that you are bundling in your job. For rest the default values would be used from the default configuration files

Re: Maven Cloudera Configuration problem

2013-08-13 Thread sandy . ryza
Hi Pavan, Configuration properties generally aren't included in the jar itself unless you explicitly set them in your java code. Rather they're picked up from the mapred-site.xml file located in the Hadoop configuration directory on the host you're running your job from. Is there an issue

Re: Exceptions in Name node and Data node logs

2013-08-13 Thread Vimal Jain
Hi, As Jitendra pointed out , this issue was fixed in .20 version. I am using Hadoop 1.1.2 so why its occurring again ? Please help here. On Tue, Aug 13, 2013 at 2:56 PM, Vimal Jain vkj...@gmail.com wrote: Hi Jitendra, Thanks for your reply. Currently my hadoop/hbase is down in production as

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Pavan Sudheendra
Hi Shabab and Sandy, The thing is we have a 6 node cloudera cluster running.. For development purposes, i was building a map-reduce application on a single node apache distribution hadoop with maven.. To be frank, i don't know how to deploy this application on a multi node cloudera cluster. I am

Re: YARN with local filesystem

2013-08-13 Thread Rod Paulk
I was able to execute the example by running the job as the yarn user. For example the following successfully completes: sudo -u yarn yarn org.apache.hadoop.examples.RandomWriter /tmp/random-out Whereas this fails with the local user rpaulk: yarn org.apache.hadoop.examples.RandomWriter

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Pavan Sudheendra
When i actually run the job on the multi node cluster, logs shows it uses localhost configurations which i don't want.. I just have a pom.xml which lists all the dependencies like standard hadoop, standard hbase, standard zookeeper etc., Should i remove these dependencies? I want the cluster

Requesting set of containers on a single node

2013-08-13 Thread Krishna Kishore Bonagiri
Hi, My application has a group of processes that need to communicate with each other either through Shared Memory or TCP/IP depending on where the containers are allocated, on the same machine or on different machines. Obviously I would like them to get them allocated on the same node whenever

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Brad Cox
I've been stuck on the same question lately so don't take this as definitive, just my best guess at what's required. Using maven as your hadoop source is going to give you a vanilla hadoop; one that runs on localhost. You need one that you've customized to point to your remote cluster and you

Re: Maven Cloudera Configuration problem

2013-08-13 Thread sandy . ryza
Nothing in your pom.xml should affect the configurations your job runs with. Are you running your job from a node on the cluster? When you say localhost configurations, do you mean it's using the LocalJobRunner? -sandy (iphnoe tpying) On Aug 13, 2013, at 9:07 AM, Pavan Sudheendra

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Pavan Sudheendra
Yes Sandy, I'm referring to LocalJobRunner. I'm actually running the job on one datanode.. What changes should i make so that my application would take advantage of the cluster as a whole? On Tue, Aug 13, 2013 at 10:33 PM, sandy.r...@cloudera.com wrote: Nothing in your pom.xml should affect

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Shahab Yunus
You should not use LocalJobRunner. Make sure that the mapred.job.tracker property does not point to 'local' an instead to your job-tracker host and port. *But before that* as Sandy said, your client machine (from where you will be kicking of your jobs and apps) should be using config files which

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Brad Cox
That link got my hopes up. But Cloudera Manager (what I'm running; on CDH4) does not offer an Export Client Config option. What am I missing? On Aug 13, 2013, at 4:04 PM, Shahab Yunus shahab.yu...@gmail.com wrote: You should not use LocalJobRunner. Make sure that the mapred.job.tracker

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Shahab Yunus
In our Clouder 4.2.0 cluster, Iog-in with *admin* user (do you have appropriate permissions by the way?) Then I click on any one of the 3 services (hbase, mapred, hdfs and excluding zookeeper) from the top-leftish menu. Then for each of these I can click the *Configuration* tab which is in the

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Suresh Srinivas
Folks, can you please take this thread to CDH related mailing list? On Tue, Aug 13, 2013 at 3:07 PM, Brad Cox bradj...@gmail.com wrote: That link got my hopes up. But Cloudera Manager (what I'm running; on CDH4) does not offer an Export Client Config option. What am I missing? On Aug 13,

updated to 1.2.1, map completed percentage keeps oscillating

2013-08-13 Thread kaveh minooie
Hi everyone I recently updated my cluster to 1.2.1 and now the percentage of compeleted map tasks while the job is running keeps changing: 13/08/13 16:53:01 INFO mapred.JobClient: Running job: job_201308131452_0007 13/08/13 16:53:02 INFO mapred.JobClient: map 0% reduce 0% 13/08/13 16:53:19

Calling a MATLAB library in map reduce program

2013-08-13 Thread Chandra Mohan, Ananda Vel Murugan
Hi, I have to run some analytics on the files present in HDFS using a MATLAB code. I am thinking of compiling the MATLAB code into a C++ library and calling it in map reduce code. How can I implement this? I read Hadoop streaming or Hadoop pipes can be used for this. But I have not tried it on

Reduce Task Clarification

2013-08-13 Thread Sam Garrett
I am working on a MapReduce job where I would like to have the output sorted by a LongWritable value. I read the Anatomy of a MapReduce Run in the Definitive Guide and it didn't say explicitly whether reduce() gets called only once per map output key. If it does get called only once I was thinking