Re: RAID vs. JBOD

2009-01-12 Thread David B. Ritch
Thank you - yes, I'm fairly confident that it will work either way. I'm trying to find out whether there is an established best practice, and the performance impact of the decision between RAID 0 and JBOD. I'll check out the noatime and nodiratime for their effect on our performance - thanks for

Re: RAID vs. JBOD

2009-01-12 Thread Brian Vargas
David, As I understand it, you will theoretically get better performance from a JBOD configuration than a RAID configuration. In a RAID configuration, you have to wait for the slowest disk in the array to complete before the entire IO operation can complete, making the average IO time equivalent

Re: Problem with Hadoop and concatenated gzip files

2009-01-12 Thread Tom White
I've opened https://issues.apache.org/jira/browse/HADOOP-5014 for this. Do you get this behaviour when you use the native libraries? Tom On Sat, Jan 10, 2009 at 12:26 AM, Oscar Gothberg oscar.gothb...@platform-a.com wrote: Hi, I'm having trouble with Hadoop (tested with 0.17 and 0.19) not

Re: RAID vs. JBOD

2009-01-12 Thread Colin Evans
Currently, Hadoop does round-robin allocation of blocks and data across multiple JBOD disks. We did some testing and found that there weren't significant differences between RAID-0 and JBOD. We went with JBOD because we figured that RAID-0 has a higher failure rate than JBOD -- any disk

RE: Problem with Hadoop and concatenated gzip files

2009-01-12 Thread Oscar Gothberg
Thanks Tom, yes, assuming I got native libraries correctly enabled... I get: 09/01/12 11:33:19 INFO util.NativeCodeLoader: Loaded the native-hadoop library 09/01/12 11:33:19 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library ...at startup, and then I try without by

Re: 0.18.1 datanode psuedo deadlock problem

2009-01-12 Thread Raghu Angadi
Sagar Naik wrote: Hi Raghu, The periodic du and block reports thread thrash the disk. (Block Reports takes abt on an avg 21 mins ) and I think all the datanode threads are not able to do much and freeze yes, that is the known problem we talked about in the earlier mails in this thread.

Re: 0.18.1 datanode psuedo deadlock problem

2009-01-12 Thread Jason Venner
There is no reason to do the block scans. All of the modern kernels will provide you notification when an file or directory is altered. This could be readily handled with a native application that writes structured data to a receiver in the Datanode, or via JNA/JNI for pure java or mixed

Dynamic Node Removal and Addition

2009-01-12 Thread Hargraves, Alyssa
Hello everyone, I have a question and was hoping some on the mailinglist could offer some pointers. I'm working on a project with another student and for part of this project we are trying to create something that will allow nodes to be added and removed from the hadoop cluster at will. The

Re: Storing/retrieving time series with hadoop

2009-01-12 Thread Robert Zubek
We use Hadoop to warehouse time series data, and run analytics on them. Being able to parallelize our analytics jobs, and scale up the cluster as needed for the data, turned out to be a big win. However, we rolled our own storage solution. At the time when we started on this project, there

Re: 0.18.1 datanode psuedo deadlock problem

2009-01-12 Thread Jason Venner
The thought is that the notifier would stat each file as it was notified about it, and thus would have the real time dusage information also. There would be no need for the current du task or the block task after startup (ie: do it one time to compute the current blocks and space). After

Re: RAID vs. JBOD

2009-01-12 Thread David Ritch
Thank you! I'm glad to hear that you have actually tested this. I believe that a failure of any disk - even with JBOD - will cause dataNode to bring the node down. Presumably, we could bring it right back up, but this does sort of diminish the availability argument for JBOD. Sounds like it's

Re: 0.18.1 datanode psuedo deadlock problem

2009-01-12 Thread Jason Venner
Here is some simple code I wrote using JNA to handline linux INOTIFY. This code was my first and only attempt to use JNA. The JNA jars are available from https://jna.dev.java.net/ Raghu Angadi wrote: Jason Venner wrote: There is no reason to do the block scans. All of the modern kernels

Re: Storing/retrieving time series with hadoop

2009-01-12 Thread Chris K Wensel
Hey Brock I used Cascading quite extensively with time series data. Along with the standard function/filter/aggregator operations in the Cascading processing model, there is what we call a buffer. Its really just a user friendly Reduce that integrates well with other operations and offers

Job Tracker mbeans

2009-01-12 Thread Nick Bailey
I'm wondering if hadoop creates any jobtracker mbeans by default. I'm looking to get some of the counter info for jobs through jmx. When connecting to the job tracker through jconsole, all I see are generic java mbeans. I am running hadoop 0.15.3. Does anyone know how to get this data or if

Re: General questions about Map-Reduce

2009-01-12 Thread Philip (flip) Kromer
On Sun, Jan 11, 2009 at 9:05 PM, tienduc_dinh tienduc_d...@yahoo.comwrote: Is there any article which describes it ? There's also Tom White's in-progress Hadoop: The Definitive Guide: http://my.safaribooksonline.com/9780596521974 flip -- http://www.infochimps.org Connected Open Free Data

Re: General questions about Map-Reduce

2009-01-12 Thread Stuart White
On Sun, Jan 11, 2009 at 9:05 PM, tienduc_dinh tienduc_d...@yahoo.comwrote: Is there any article which describes it ? I'd also recommend Google's MapReduce whitepaper: http://labs.google.com/papers/mapreduce.html

stop the running job?

2009-01-12 Thread Samuel Guo
Hi all, Is there any method that I can use to stop or suspend a runing job in Hadoop? Regards, Samuel

Re: stop the running job?

2009-01-12 Thread Edward J. Yoon
You can kill jobs using job command. ./bin/hadoop job -kill job-id /Edward On Tue, Jan 13, 2009 at 11:10 AM, Samuel Guo guosi...@gmail.com wrote: Hi all, Is there any method that I can use to stop or suspend a runing job in Hadoop? Regards, Samuel -- Best Regards, Edward J. Yoon @

Re: stop the running job?

2009-01-12 Thread Lohit
Try ./bin/hadoop job -h Lohit On Jan 12, 2009, at 6:10 PM, Samuel Guo guosi...@gmail.com wrote: Hi all, Is there any method that I can use to stop or suspend a runing job in Hadoop? Regards, Samuel