Re: Secondary namenode is crashing and complaining about non-existing files

2013-08-08 Thread Jameson Li
refer to https://issues.apache.org/jira/browse/HDFS-2827 when do this operation: hadoop fs -mv /a/b / , maybe reappear this issue. 2012/5/10 Alex Levin ale...@gmail.com Hi, I have an issue with crashing secondary namenode due to a simple move operation Appreciate any ideas on the

is RM require a lot of memory?

2013-08-08 Thread ch huang
the yarn resource manager require a lot of memory as jobtracker?

Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Sathwik B P
Hi, LineRecordWriter.write(..) is synchronized. I did not find any other RecordWriter implementations define the write as synchronized. Any specific reason for this. regards, sathwik

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Azuryy Yu
because we may use multi-threads to write a single file. On Aug 8, 2013 2:54 PM, Sathwik B P sath...@apache.org wrote: Hi, LineRecordWriter.write(..) is synchronized. I did not find any other RecordWriter implementations define the write as synchronized. Any specific reason for this.

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Sathwik B P
Hi, Thanks for your reply. May I know where does hadoop fork multiple threads to use a single RecordWriter. regards, sathwik On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu azury...@gmail.com wrote: because we may use multi-threads to write a single file. On Aug 8, 2013 2:54 PM, Sathwik B P

RE: is it ok? build hadoop cluster on kvm on product envionment?

2013-08-08 Thread Sourygna Luangsay
Hi, In my company we sometimes uses KVM to launch test or small demo clusters. Every developer also has a KVM pseudo distributed cluster on its computer. Nonetheless, I would not recommend using KVM for production clusters. Check this link about all the theory of Hadoop with

issue about hadoop hardware choose

2013-08-08 Thread ch huang
hi,all: My company need build a 10 node hadoop cluster (2 namenode and 8 datanode node manager ,for both data storage and data analysis ) ,we have hbase ,hive on the hadoop cluster, 10G data increment per day. we use CDH4.3 ( for dual - namenode HA),my plan is

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Azuryy Yu
its not hadoop forked threads, we may create a line record writer, then call this writer concurrently. On Aug 8, 2013 4:00 PM, Sathwik B P sathwik...@gmail.com wrote: Hi, Thanks for your reply. May I know where does hadoop fork multiple threads to use a single RecordWriter. regards,

Re: issue about hadoop hardware choose

2013-08-08 Thread Azuryy Yu
if you want HA, then do you want to deploy journal node on the DN? On Aug 8, 2013 5:09 PM, ch huang justlo...@gmail.com wrote: hi,all: My company need build a 10 node hadoop cluster (2 namenode and 8 datanode node manager ,for both data storage and data analysis ) ,we have hbase

Re: Oozie ssh action error

2013-08-08 Thread Kasa V Varun Tej
Hey Jitendra, I ensured those two things you mentioned, still i'm facing the same issue. Regards, Kasa On Wed, Aug 7, 2013 at 7:32 PM, Jitendra Yadav jeetuyadav200...@gmail.comwrote: Hi, I hope below points might help you. *Approach 1#* You need to change the sshd_config file in the

Re: Oozie ssh action error

2013-08-08 Thread Kasa V Varun Tej
*logs:* * * * 2013-08-08 06:03:51,627 INFO org.apache.oozie.command.wf.ActionStartXCommand: USER[root] GROUP[-] TOKEN[] APP[clickstream-wf] JOB[044-130719141217337-oozie-oozi-W] ACTION[044-130719141217337-oozie-oozi-W@:start:] Start action [044-130719141217337-oozie-oozi-W@:start:]

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Harsh J
While we don't fork by default, we do provide a MultithreadedMapper implementation that would require such synchronization. But if you are asking is it necessary, then perhaps the answer is no. On Aug 8, 2013 3:43 PM, Azuryy Yu azury...@gmail.com wrote: its not hadoop forked threads, we may

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Niels Basjes
I may be nitpicking here but if perhaps the answer is no then I conclude: Perhaps the other implementations of RecordWriter are a race condition/file corruption ready to occur. On Thu, Aug 8, 2013 at 12:50 PM, Harsh J ha...@cloudera.com wrote: While we don't fork by default, we do provide a

why FairScheduler prefer to schedule MR jobs into the same node?

2013-08-08 Thread devdoer bird
HI: I configure the FairScheduler with default settings and my job has 19 reduce tasks. I found that all the reduce tasks are schedule to run in one node. While with default FIFO schedule, the 19 reduce tasks are scheduled into diffrent nodes. How can I configure FairSchedule to load more

Re: Datanode doesn't connect to Namenode

2013-08-08 Thread Felipe Gutierrez
Thanks, at all files I changed to master (cloud6) and I take off this property namehadoop.tmp.dir/name. Felipe On Wed, Aug 7, 2013 at 3:20 PM, Shekhar Sharma shekhar2...@gmail.comwrote: Disable the firewall on data node and namenode machines.. Regards, Som Shekhar Sharma +91-8197243810

Re: Datanode doesn't connect to Namenode

2013-08-08 Thread Shekhar Sharma
if you have removed this property from the slave machines then your DN information will be created under /tmp folder and once you reboot your data node machines, the information will be lost.. Sorry i have not seen the logs..but you dont have play around the properties.. ...see datanode will not

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Jay Vyas
Then is this a bug? Synchronization in absence of any race condition is normally considered bad. In any case id like to know why this writer is synchronized whereas the other one are not.. That is, I think, then point at issue: either other writers should be synchronized or else this one

Re: Datanode doesn't connect to Namenode

2013-08-08 Thread Shekhar Sharma
keep the configuration same in the datanodes as well for the time being..Only thing that data node or slave machine should know is Masters files ( that means who is the master) and you need to tell the slave machine where is your namenode running, which you need to specify in the property

Re: Datanode doesn't connect to Namenode

2013-08-08 Thread Felipe Gutierrez
Thanks for the hints Shekhar. My cluster is running well. Felipe On Thu, Aug 8, 2013 at 8:56 AM, Shekhar Sharma shekhar2...@gmail.comwrote: keep the configuration same in the datanodes as well for the time being..Only thing that data node or slave machine should know is Masters files ( that

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Sathwik B P
Hi Harsh, Do you want me to raise a Jira on this. regards, sathwik On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas jayunit...@gmail.com wrote: Then is this a bug? Synchronization in absence of any race condition is normally considered bad. In any case id like to know why this writer is

Re: issue about hadoop hardware choose

2013-08-08 Thread Mirko Kämpf
Hello Ch Huang, Do you know this book? Hadoop Operations http://shop.oreilly.com/product/0636920025085.do I think, it answers most of the questions in detail. For a production cluster you should consider MRv1. And I suggest you, to go with more hard drives per slave node to have a higher IO

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Niels Basjes
I would say yes make this a Jira. The actual change can fall (as proposed by Jay) in two directions: Put in synchronization in all implementations OR take it out of all implementations. I think the first thing to determine is why the synchronization was put into the LineRecordWriter in the first

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Azuryy Yu
sequence writer is also synchronized, I dont think this is bad. if you call HDFS api to write concurrently, then its necessary. On Aug 8, 2013 7:53 PM, Jay Vyas jayunit...@gmail.com wrote: Then is this a bug? Synchronization in absence of any race condition is normally considered bad. In

Re: why FairScheduler prefer to schedule MR jobs into the same node?

2013-08-08 Thread Sandy Ryza
Hi devdoer, What version are you using? -Sandy On Thu, Aug 8, 2013 at 4:25 AM, devdoer bird devd...@gmail.com wrote: HI: I configure the FairScheduler with default settings and my job has 19 reduce tasks. I found that all the reduce tasks are schedule to run in one node. While with

Hosting Hadoop

2013-08-08 Thread Dhaval Shah
We are exploring the possibility of hosting Hadoop outside of our data centers. I am aware that Hadoop in general isn't exactly designed to run on virtual hardware. So a few questions: 1. Are there any providers out there who would host Hadoop on dedicated physical hardware?  2. Has anyone had

Re: Hosting Hadoop

2013-08-08 Thread Marcos Luis Ortiz Valmaseda
Well, all depends, because many companies use Cloud Computing platforms like Amazon EMR. Vmware, Rackscpace Cloud for Hadoop hosting: http://aws.amazon.com/elasticmapreduce http://www.vmware.com/company/news/releases/vmw-mapr-hadoop-062013.html http://bitrefinery.com/services/hadoop-hosting

Re: Hosting Hadoop

2013-08-08 Thread Dhaval Shah
Thanks for the list Marcos. I will go through the slides/links. I think that's helpful   Regards, Dhaval From: Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com To: Dhaval Shah prince_mithi...@yahoo.co.in Cc: user@hadoop.apache.org Sent: Thursday, 8 August

scripts for mapred.healthChecker option?

2013-08-08 Thread Jeff Kubina
Are there any standard or recommended scripts for the mapred.healthChecker options in the mapred-site.xml configuration file for a linux box? -Jeff

Converting a Path to a full URI String and preserving special characters

2013-08-08 Thread Public Network Services
Is there a reliable way of converting an HDFS Path object into a String? Invoking path.toUri().toString() does not work with special characters (e.g., if there are spaces in the original path string). So, for instance, in the following example String address = ...; // Path string without the

Re: is RM require a lot of memory?

2013-08-08 Thread Marcos Luis Ortiz Valmaseda
Remember that in YARN, the two main responsibilities of the JobTracker is divided in two different components: - Resource Management by ResourceManager (this is a global component) - Job scheduling and monitoring by the NodeManager (this is a per-node component) - Resource negotiation and task

Fwd: Mapreduce for beginner

2013-08-08 Thread Olivier Austina
Hi, I start learning about mapreduce with Hadoop by wordcount example. I am a bit confused about the frontier between the map and reduce program. Is there a standard format for the map output and the reduce input? Is there a full explanation of java classes used somewhere? I also appreciate to

Re: is RM require a lot of memory?

2013-08-08 Thread ch huang
so ,from performance aspect i need seperate RN with NN,because each of them is memory hungry On Fri, Aug 9, 2013 at 8:00 AM, Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com wrote: Remember that in YARN, the two main responsibilities of the JobTracker is divided in two different

Re: alternative to $HADOOP_HOME/lib

2013-08-08 Thread Sanjeev Verma
On 08/08/2013 09:23 PM, John Hancock wrote: Where else might one put .jar files that a map/reduce job will need? Why do you need an alternative location? Is there a constraint on being able to place your library jars under $HADOOP_HOME/lib?

Re: Mapreduce for beginner

2013-08-08 Thread Shahab Yunus
Given that your questions are very broad and at high level, I would suggest that you should pick up a book or such to go through that. The Hadoop: Definitive Guide by Tom White is a great book to start with it. Meanwhile some links to start with:

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Harsh J
I suppose I should have been clearer. There's no problem out of box if people stick to the libraries we offer :) Yes the LRW was marked synchronized at some point over 8 years ago [1] in support for multi-threaded maps, but the framework has changed much since then. The MultithreadedMapper/etc.

Re: issue about resource manager HA

2013-08-08 Thread Harsh J
You are partially incorrect - NameNode is not an SPOF any longer in 2.x releases. Please look at the docs that cover HA in the release you use, such as http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html. RM HA is not yet present, but is incoming.

Re: problem about DNNM directory design

2013-08-08 Thread Harsh J
This is appropriate. You are making use of both disk mounts and have the directories for each service isolated as well. On Fri, Aug 9, 2013 at 7:37 AM, ch huang justlo...@gmail.com wrote: hi,all: i plan to put DN together with NM, i want to use 2*1TB disk , one disk mount on /data/1 and

Re: alternative to $HADOOP_HOME/lib

2013-08-08 Thread Harsh J
John, I assume you do not wish to be using the DistributedCache (or a HDFS location for DistributedCache), which is the most ideal way to ship jars. You can place your jars onto the TT classpaths by placing them at an arbitrary location such as /opt/jars, and editing the TT's hadoop-env.sh to

MutableCounterLong and MutableCounterLong class difference in metrics v2

2013-08-08 Thread lei liu
I use hadoop-2.0.5, there are MutableCounterLong and MutableCounterLong class in metrics v2. I am studing metrics v2 code. What are difference MutableCounterLong and MutableCounterLong class ? I find the MutableCounterLong is used to calculate throughput, is that right? How does the metrics