Re: Hbase instalation

2013-03-22 Thread Harsh J
Please use the right lists for your questions. A HBase question does not fit on the mapreduce-user@ lists :) I've moved your question to the u...@hbase.apache.org lists and have put my comments below. Please use this list instead for all HBase related questions. On Fri, Mar 22, 2013 at 11:18 AM,

Re: Bug in LocalJobRunner?

2013-03-22 Thread Alex Baranau
Hi Harsh J, Thanx for taking a look. I created https://issues.apache.org/jira/browse/MAPREDUCE-5097 and attached patch. I also provided (ugly, sorry) example of how to get the error. Alex Baranau On Thu, Mar 21, 2013 at 5:58 AM, Harsh J ha...@cloudera.com wrote: Hi Alex, This seems to make

TupleWritable value in mapper Not getting cleaned up ( using CompositeInputFormat )

2013-03-22 Thread devansh kumar
Hi,   I am trying to do an outer join on to input files. Can anyone help me to find out the problem here??   But while joining the TupleWritable value in the mapper is not getting cleaned up and so is using the previous values of a different key.   The code I used is : (  ‘plist’ is containing

Re: Error starting ResourceManager with hadoop-2.0.3-alpha

2013-03-22 Thread Krishna Kishore Bonagiri
Thanks Hitesh, it worked. I just copied the capacity-scheduler.xml from that link and added a property yarn.scheduler.capacity.child.queues similar to yarn.scheduler.capacity.root.queues which is already there. Thanks again, Kishore On Thu, Mar 21, 2013 at 10:55 PM, Hitesh Shah

MapReduce Failed and Killed

2013-03-22 Thread Jinchun Kim
Hi, All. I'm trying to create category-based splits of Wikipedia dataset(41GB) and the training data set(5GB) using Mahout. I'm using following command. $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i wikipedia/chunks -o wikipediainput -c $MAHOUT_HOME/examples/temp/categories.txt I had no

Fwd: Need Help on Hadoop cluster Setup

2013-03-22 Thread Munnavar Sk
Hi , I am new to Hadoop and I am fighting with this last 20days, somehowI got very good stuff on Hadoop. But, some question are roaming around me...I hope, I can getthe answers from your end...! I was setup a cluster in distributed mode with 5 nodes. Ihave configured Namenode and

Need Help on Hadoop cluster Setup

2013-03-22 Thread Munnavar Sk
Hi Techies, I am new to Hadoop and I am fighting with this last 20days, somehowI got very good stuff on Hadoop. But, some question are roaming around me...I hope, I can getthe answers from your end...! I was setup a cluster in distributed mode with 5 nodes. Ihave configured Namenode and

Re: Need Help on Hadoop cluster Setup

2013-03-22 Thread Mohammad Tariq
Hello Munavvar, It depends on your configuration where your DNs and TTs will run. If you have configured all your slaves to run both the processes then they should. If they are not running then there is definitely some problem. Could you please check your DN logs once and see if you find

Re: Need Help on Hadoop cluster Setup

2013-03-22 Thread MShaik
Hi, DataNode is not started on all the nodes, as tasktracker is started on all the nodes. please find the below datanode log, please let me know the solution. 2013-03-22 19:52:27,380 INFO org.apache.hadoop.ipc.RPC: Server at n1.hc.com/192.168.1.110:54310 not available yet, Z...

Re: Need Help on Hadoop cluster Setup

2013-03-22 Thread Mohammad Tariq
have you reformatted the hdfs?if that is the case it was, i think, not proper. were the nodes which you attached serving some other cluster earlier?your logs show that you are facing problems because of mismatch in the IDs of the NN and the IDs which DNs have. to overcome this problem you can

Group names for custom Counters

2013-03-22 Thread Tony Burton
Hi list, I'm using Hadoop 1.0.3 and creating some custom Counters in my Mapper. I've got an enum that defines the list of counters, and I'm incrementing in the map function using context.getCounter(counter name).increment(1). I see that there's another implementation of context.getCounter()

Re: Need Help on Hadoop cluster Setup

2013-03-22 Thread MShaik
Thank you, Tariq. After chang the namesapceID on datanodes, all datanodes are started. Thank you once again...! -Original Message- From: Mohammad Tariq donta...@gmail.com To: user user@hadoop.apache.org Sent: Fri, Mar 22, 2013 8:29 pm Subject: Re: Need Help on Hadoop cluster Setup

RE: problem running multiple native mode map reduce processes concurrently

2013-03-22 Thread Derrick H. Karimi
Thank you for the response. Hadoop 0.20.2-cdh3u3 --Derrick H. Karimi --Software Developer, SEI Innovation Center --Carnegie Mellon University -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Friday, March 22, 2013 1:32 AM To: user@hadoop.apache.org Subject: Re:

Re: Need Help on Hadoop cluster Setup

2013-03-22 Thread Mohammad Tariq
you are welcome. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, Mar 22, 2013 at 8:48 PM, MShaik mshai...@aol.com wrote: Thank you, Tariq. After chang the namesapceID on datanodes, all datanodes are started. Thank you once again...! -Original

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-22 Thread Алексей Бабутин
2013/3/20 Tapas Sarangi tapas.sara...@gmail.com Thanks for your reply. Some follow up questions below : On Mar 20, 2013, at 5:35 AM, Алексей Бабутин zorlaxpokemon...@gmail.com wrote: dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help you,because it makes all the

Re: Group names for custom Counters

2013-03-22 Thread Michel Segel
Just a suggestion, look at dynamic counters... For the group, just create a group name and you are done. Sent from a remote device. Please excuse any typos... Mike Segel On Mar 22, 2013, at 11:17 AM, Tony Burton tbur...@sportingindex.com wrote: Hi list, I'm using Hadoop 1.0.3 and creating

Re: About running a simple wordcount mapreduce

2013-03-22 Thread Abdelrahman Shettia
Hi Redwane , It is possible that the hosts which are running tasks are do not have enough space. Those dirs are confiugred in mapred-site.xml On Fri, Mar 22, 2013 at 8:42 AM, Redwane belmaati cherkaoui reduno1...@googlemail.com wrote: -- Forwarded message -- From: Redwane

Re: About running a simple wordcount mapreduce

2013-03-22 Thread Serge Blazhievsky
Check web ui how much space you have on hdfs??? Sent from my iPhone On Mar 22, 2013, at 11:41 AM, Abdelrahman Shettia ashet...@hortonworks.com wrote: Hi Redwane , It is possible that the hosts which are running tasks are do not have enough space. Those dirs are confiugred in

Re: About running a simple wordcount mapreduce

2013-03-22 Thread reduno1985
Thanks .  Each host has 8gb but hadoop is estimating too much space the number estimated is too big for any host in the world ;). My input data are  simple text files that do not exceed 20 mb. I do not know why hadooop os estimating that much.  Sent from Samsung MobileAbdelrahman Shettia

Re: About running a simple wordcount mapreduce

2013-03-22 Thread reduno1985
I have my hosts running on openstack virtual machine instances each instance has 10gb hard disc . Is there a way too see how much space is in the hdfs without web ui . Sent from Samsung MobileSerge Blazhievsky hadoop...@gmail.com wrote:Check web ui how much space you have on hdfs??? Sent

Re: What happens when you have fewer input files than mapper slots?

2013-03-22 Thread jeremy p
Apologies -- I don't understand this advice : If the evenness is the goal you can also write your own input format that return empty locations for each split and read the small files in map task directly. How would manually reading the files into the map task help me? Hadoop would still spawn

Re: About running a simple wordcount mapreduce

2013-03-22 Thread Abdelrahman Shettia
Hi Redwane, Please run the following command as hdfs user on any datanode. The output will be something like this. Hope this helps hadoop dfsadmin -report Configured Capacity: 81075068925 (75.51 GB) Present Capacity: 70375292928 (65.54 GB) DFS Remaining: 69895163904 (65.09 GB) DFS Used:

Re: What happens when you have fewer input files than mapper slots?

2013-03-22 Thread jeremy p
Is there a way to force an even spread of data? On Fri, Mar 22, 2013 at 2:14 PM, jeremy p athomewithagroove...@gmail.comwrote: Apologies -- I don't understand this advice : If the evenness is the goal you can also write your own input format that return empty locations for each split and read

Cluster lost IP addresses

2013-03-22 Thread John Meza
I have a 18 node cluster that had to be physically moved.Unfortunately all the ip addresses were lost (recreated). This must have happened to someone before.Nothing else on the machines has been changed. Most importantly the data in HDFS is still sitting there. Is there a way to recover this

Re: The most newbie question ever

2013-03-22 Thread Keith Thomas
OK. I have kept battling through, guessing at the gaps in the getting started page but the final command to run the hadoop-examples.jar has blocked me. As far as I can tell there is no hadoop-examples.jar file in the distribution. At a higher level I must be doing something wrong. The path I've

Re: Capacity Scheduler question

2013-03-22 Thread jeremy p
Thanks for the help. Sadly, I don't think the Fair Scheduler will help me here. It will let you specify the number of concurrent task slots for a pool, but that applies to the entire cluster. For a given pool, I need to set the maximum number of task slots per machine. On Fri, Mar 22, 2013 at

Re: Cluster lost IP addresses

2013-03-22 Thread பாலாஜி நாராயணன்
Assuming you are using hostnAmes and not ip address in your config files What happens when you start the cluster? If you are using IP address in your configs just update them and start. It should work with no issues. On Friday, March 22, 2013, John Meza wrote: I have a 18 node cluster that had

Re: Cluster lost IP addresses

2013-03-22 Thread Azuryy Yu
it has issues, namenode save blockid-nodes, using ip addr if your slaves config file using ip addr instead of hostname. On Mar 23, 2013 10:14 AM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Assuming you are using hostnAmes and not ip address in your config files What happens

how to control (or understand) the memory usage in hdfs

2013-03-22 Thread Ted
Hi I'm new to hadoop/hdfs and I'm just running some tests on my local machines in a single node setup. I'm encountering out of memory errors on the jvm running my data node. I'm pretty sure I can just increase the heap size to fix the errors, but my question is about how memory is actually used.

Re: Setup/Cleanup question

2013-03-22 Thread Harsh J
Assuming you speak of MRv1 (1.x/0.20.x versions), there is just 1 Job Setup and 1 Job Cleanup tasks additionally run for each Job. On Sat, Mar 23, 2013 at 9:10 AM, Sai Sai saigr...@yahoo.in wrote: When running an MR job/program assuming there r 'n' (=100) Mappers triggered then my question is

Re: Cluster lost IP addresses

2013-03-22 Thread Harsh J
NameNode does not persist block locations; so this is still recoverable if the configs are changed to use the new set of hostnames to bind to/look up. On Sat, Mar 23, 2013 at 9:01 AM, Azuryy Yu azury...@gmail.com wrote: it has issues, namenode save blockid-nodes, using ip addr if your slaves

Re: how to control (or understand) the memory usage in hdfs

2013-03-22 Thread Harsh J
I run a 128 MB heap size DN for my simple purposes on my Mac and it runs well for what load I apply on it. A DN's primary, growing memory consumption comes from the # of blocks it carries. All of these blocks' file paths are mapped and kept in the RAM during its lifetime. If your DN has acquired

Re: Cluster lost IP addresses

2013-03-22 Thread Chris Embree
Hey John, Make sure your /etc/hosts ( or DNS) is up to date and any topology scripts are updated. Unfortunately, NN is pretty dumb about IP's vs. Hostnames. BTW, NN devs. Seriously? You rely on IP addr instead of hostname? Someone should probably be shot or at least be responsible for

Re: Cluster lost IP addresses

2013-03-22 Thread Harsh J
Hi Chris, Where exactly are you seeing issues with change of NN/DN IPs? I've never encountered trouble on IP changes (I keep moving across networks everyday and the HDFS plus MR I run both stand tall without requiring a restart). We do not store (generally) nor rely on IP addresses. An exclusion

Re: Cluster lost IP addresses

2013-03-22 Thread Azuryy Yu
oh,yes,it's not persist, only in memory. so there is no issue. On Mar 23, 2013 1:13 PM, Harsh J ha...@cloudera.com wrote: NameNode does not persist block locations; so this is still recoverable if the configs are changed to use the new set of hostnames to bind to/look up. On Sat, Mar 23,

Re: Setup/Cleanup question

2013-03-22 Thread Sai Sai
Thanks Harsh. So the setup/cleanup r for the Job and not the Mappers i take it. Thanks. From: Harsh J ha...@cloudera.com To: user@hadoop.apache.org user@hadoop.apache.org; Sai Sai saigr...@yahoo.in Sent: Friday, 22 March 2013 10:05 PM Subject: Re:

question for commetter

2013-03-22 Thread Azuryy Yu
is there a way to separate hdfs2 from hadoop2? I want use hdfs2 and mapreduce1.0.4, exclude yarn. because I need HDFS-HA.

Re: Dissecting MR output article

2013-03-22 Thread Sai Sai
Just wondering if there is any step by step explaination/article of MR output we get when we run a job either in eclipse or ubuntu.Any help is appreciated. Thanks Sai

Re: Dissecting MR output article

2013-03-22 Thread Azuryy Yu
hadoop definition guide.pdf should be helpful. there is a chapter for this. but only for MRv1. On Mar 23, 2013 1:50 PM, Sai Sai saigr...@yahoo.in wrote: Just wondering if there is any step by step explaination/article of MR output we get when we run a job either in eclipse or ubuntu. Any

Re: Dissecting MR output article

2013-03-22 Thread Harsh J
+1 for Hadoop: The Definitive Guide and other books. Sidenote: The 3rd Edition of Tom White's Hadoop: The Definitive Guide does have good details on MRv2 and YARN. On Sat, Mar 23, 2013 at 11:22 AM, Azuryy Yu azury...@gmail.com wrote: hadoop definition guide.pdf should be helpful. there is a