RE: Can you help me to install HDFS Federation and test?

2013-09-19 Thread Sandeep L
No its not appearing from other name node. Here is the procedure I followed:In NameNode1 I ran following commandsbin/hdfs dfs -mkdir testbin/hdfs dfs -put dummy.txt test When ran bin/hdfs -ls test command from NameNode1 its listing file fin hdfs but if I ran same command from NameNode2 out put

Re: Issue: Max block location exceeded for split error when running hive

2013-09-19 Thread Murtaza Doctor
We are using the default replication factor of 3. When new files are put on HDFS we never override the replication factor. When there is more data involved it fails on a larger split size. On Wed, Sep 18, 2013 at 6:34 PM, Harsh J ha...@cloudera.com wrote: Do your input files carry a

Stable version of Hadoop

2013-09-19 Thread hadoop hive
Hi Folks, I want to use hbase for my data storage on the top of HDFS, Please help me to find out the best version which i should used , like CDH4 I data size would be around 500gb - 5Tb. My operations would be write intensive Thanks

HDFs file-create performance

2013-09-19 Thread John Lilley
Are there any rough numbers one can give me regarding the latency of creating, writing, and closing a small HDFS-based file? Does replication have a big impact? I am trying to decide whether to communicate some modestly-sized (~200KB) information via HDFS files or go to the trouble of

How to make hadoop use all nodes?

2013-09-19 Thread Vandecreme, Antoine
Hi all, I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon). When I am starting a Job, I notice that some nodes are not used or partially used. For example, if my nodes can hold 2 containers, I notice that some nodes are not running any or just 1 while others are running 2. All my

Re: Issue: Max block location exceeded for split error when running hive

2013-09-19 Thread Edward Capriolo
We have this job submit property buried in hive that defaults to 10. We should make that configurable. On Wed, Sep 18, 2013 at 9:34 PM, Harsh J ha...@cloudera.com wrote: Do your input files carry a replication factor of 10+? That could be one cause behind this. On Thu, Sep 19, 2013 at 6:20

Re: Issue: Max block location exceeded for split error when running hive

2013-09-19 Thread Murtaza Doctor
It used to throw a warning in 1.03 and now has become an IOException. I was more trying to figure out why it is exceeding the limit even though the replication factor is 3. Also Hive may use CombineInputSplit or some version of it, are we saying it will always exceed the limit of 10? On Thu, Sep

YARN MapReduce 2 concepts

2013-09-19 Thread Mohit Anchlia
I am going through the concepts of resource manager, application master and node manager. As I undersand resource manager receives the job submission and launches application master. It also launches node manager to monitor application master. My questions are: 1. Is Node manager long lived and

Re: HDFS federation Configuration

2013-09-19 Thread Suresh Srinivas
Have you looked at - http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-project-dist/hadoop-hdfs/Federation.html Let me know if the document is not clear or needs improvements. Regards, Suresh On Thu, Sep 19, 2013 at 12:01 PM, Manickam P manicka...@outlook.com wrote: Guys, I need some

HDFS federation Configuration

2013-09-19 Thread Manickam P
Guys, I need some tutorials to configure fedration. Can you pls suggest me some? Thanks, Manickam P

[no subject]

2013-09-19 Thread Indrajeet, Verma
-- This e-mail and any attachments transmitted with it are for the sole use of the intended recipient(s) and may contain confidential , proprietary or privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the

Task status query

2013-09-19 Thread John Lilley
How does a YARN application master typically query ongoing status (like percentage completion) of its tasks? I would like to be able to ultimately relay information to the user like: 100 tasks are scheduled 10 tasks are complete 4 tasks are running and they are (4%, 10%, 50%, 70%) complete But,

Re: Name node High Availability in Cloudera 4.1.1

2013-09-19 Thread Suresh Srinivas
Please do not cross-post these emails to hdfs-user. The relevant email list is only cdh-user. On Thu, Sep 19, 2013 at 1:44 AM, Pavan Kumar Polineni smartsunny...@gmail.com wrote: Hi all, *Name Node High Availability Job tracker high availability* is there in Cloudera 4.1.1 ? If not,

Re: Issue: Max block location exceeded for split error when running hive

2013-09-19 Thread Rahul Jain
I am assuming you have looked at this already: https://issues.apache.org/jira/browse/MAPREDUCE-5186 You do have a workaround here to increase *mapreduce.job.max.split.locations *value in hive configuration, or do we need more than that here ? -Rahul On Thu, Sep 19, 2013 at 11:00 AM, Murtaza

Re: Yarn Exception while getting JobStatus

2013-09-19 Thread Siddhi Mehta
Hey Harsh, Here is the more complete stacktrace. I had truncated it earlier since it was application specific. Let me know if this helps. Thrown: java.lang.RuntimeException java.io.IOException Thrown-StackTrace: at HadoopJobUpdaterProcess.execute(HadoopJobUpdaterProcess.java:37) at

Re: Issue: Max block location exceeded for split error when running hive

2013-09-19 Thread Matt Davies
What are the ramifications of setting a hard coded value in our scripts and then changing parameters which influence the input data size. I.e. I want to run across 1 day worth of data, then a different day I want to run against 30 days? On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain

Re: Issue: Max block location exceeded for split error when running hive

2013-09-19 Thread Rahul Jain
Matt, It would be better for you to do an global config update: set *mapreduce.job.max.split.locations *to at least the number of datanodes in your cluster, either in hive-site.xml or mapred-site.xml. Either case, this is a sensible configuration update if you are going to use

Re: How to make hadoop use all nodes?

2013-09-19 Thread Omkar Joshi
Hi, Let me clarify few things. 1) you are making container requests which are not explicitly looking for certain nodes. (No white listing). 2) All nodes are identical in terms of resources (memory/cores) and every container requires same amount of resources. 3) All nodes have capacity to run say

RE: HDFS performance with an without replication

2013-09-19 Thread John Lilley
Thanks, that makes sense. john -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Sunday, September 15, 2013 12:39 PM To: user@hadoop.apache.org Subject: Re: HDFS performance with an without replication Write performance improves with lesser replicas (as a result of

Re: YARN MapReduce 2 concepts

2013-09-19 Thread Sandy Ryza
Hi Mohit, answers inline On Fri, Sep 20, 2013 at 1:33 AM, Mohit Anchlia mohitanch...@gmail.comwrote: I am going through the concepts of resource manager, application master and node manager. As I undersand resource manager receives the job submission and launches application master. It also

Re: Issue: Max block location exceeded for split error when running hive

2013-09-19 Thread Matt Davies
Thanks Rahul. Our ops people have implemented the config change. On Thursday, September 19, 2013, Rahul Jain wrote: Matt, It would be better for you to do an global config update: set *mapreduce.job.max.split.locations *to at least the number of datanodes in your cluster, either in

Re: Task status query

2013-09-19 Thread Harsh J
Hi John, YARN tasks can be more than simple executables. In case of MR, for example, tasks talk to the AM and report their individual progress and counters back to it, via a specific protocol (over the network), giving the AM more data to compute an near-accurate global progress. On Fri, Sep 20,