Thank you - yes, I'm fairly confident that it will work either way. I'm
trying to find out whether there is an established best practice, and
the performance impact of the decision between RAID 0 and JBOD.
I'll check out the noatime and nodiratime for their effect on our
performance - thanks for
David,
As I understand it, you will theoretically get better performance from a
JBOD configuration than a RAID configuration. In a RAID configuration,
you have to wait for the slowest disk in the array to complete before
the entire IO operation can complete, making the average IO time
equivalent
I've opened https://issues.apache.org/jira/browse/HADOOP-5014 for this.
Do you get this behaviour when you use the native libraries?
Tom
On Sat, Jan 10, 2009 at 12:26 AM, Oscar Gothberg
oscar.gothb...@platform-a.com wrote:
Hi,
I'm having trouble with Hadoop (tested with 0.17 and 0.19) not
Currently, Hadoop does round-robin allocation of blocks and data
across multiple JBOD disks. We did some testing and found that there
weren't significant differences between RAID-0 and JBOD. We went with
JBOD because we figured that RAID-0 has a higher failure rate than
JBOD -- any disk
Thanks Tom,
yes, assuming I got native libraries correctly enabled... I get:
09/01/12 11:33:19 INFO util.NativeCodeLoader: Loaded the native-hadoop library
09/01/12 11:33:19 INFO zlib.ZlibFactory: Successfully loaded initialized
native-zlib library
...at startup, and then I try without by
Sagar Naik wrote:
Hi Raghu,
The periodic du and block reports thread thrash the disk. (Block
Reports takes abt on an avg 21 mins )
and I think all the datanode threads are not able to do much and freeze
yes, that is the known problem we talked about in the earlier mails in
this thread.
There is no reason to do the block scans. All of the modern kernels will
provide you notification when an file or directory is altered.
This could be readily handled with a native application that writes
structured data to a receiver in the Datanode, or via JNA/JNI for pure
java or mixed
Hello everyone,
I have a question and was hoping some on the mailinglist could offer some
pointers. I'm working on a project with another student and for part of this
project we are trying to create something that will allow nodes to be added and
removed from the hadoop cluster at will. The
We use Hadoop to warehouse time series data, and run analytics on them.
Being able to parallelize our analytics jobs, and scale up the cluster
as needed for the data, turned out to be a big win.
However, we rolled our own storage solution. At the time when we started
on this project, there
The thought is that the notifier would stat each file as it was notified
about it, and thus would have the real time dusage information also.
There would be no need for the current du task or the block task after
startup (ie: do it one time to compute the current blocks and space).
After
Thank you! I'm glad to hear that you have actually tested this.
I believe that a failure of any disk - even with JBOD - will cause dataNode
to bring the node down. Presumably, we could bring it right back up, but
this does sort of diminish the availability argument for JBOD.
Sounds like it's
Here is some simple code I wrote using JNA to handline linux INOTIFY.
This code was my first and only attempt to use JNA.
The JNA jars are available from https://jna.dev.java.net/
Raghu Angadi wrote:
Jason Venner wrote:
There is no reason to do the block scans. All of the modern kernels
Hey Brock
I used Cascading quite extensively with time series data.
Along with the standard function/filter/aggregator operations in the
Cascading processing model, there is what we call a buffer.
Its really just a user friendly Reduce that integrates well with other
operations and offers
I'm wondering if hadoop creates any jobtracker mbeans by default. I'm looking
to get some of the counter info for jobs through jmx. When connecting to the
job tracker through jconsole, all I see are generic java mbeans. I am running
hadoop 0.15.3. Does anyone know how to get this data or if
On Sun, Jan 11, 2009 at 9:05 PM, tienduc_dinh tienduc_d...@yahoo.comwrote:
Is there any article which describes it ?
There's also Tom White's in-progress Hadoop: The Definitive Guide:
http://my.safaribooksonline.com/9780596521974
flip
--
http://www.infochimps.org
Connected Open Free Data
On Sun, Jan 11, 2009 at 9:05 PM, tienduc_dinh tienduc_d...@yahoo.comwrote:
Is there any article which describes it ?
I'd also recommend Google's MapReduce whitepaper:
http://labs.google.com/papers/mapreduce.html
Hi all,
Is there any method that I can use to stop or suspend a runing job in
Hadoop?
Regards,
Samuel
You can kill jobs using job command.
./bin/hadoop job -kill job-id
/Edward
On Tue, Jan 13, 2009 at 11:10 AM, Samuel Guo guosi...@gmail.com wrote:
Hi all,
Is there any method that I can use to stop or suspend a runing job in
Hadoop?
Regards,
Samuel
--
Best Regards, Edward J. Yoon @
Try
./bin/hadoop job -h
Lohit
On Jan 12, 2009, at 6:10 PM, Samuel Guo guosi...@gmail.com wrote:
Hi all,
Is there any method that I can use to stop or suspend a runing job in
Hadoop?
Regards,
Samuel
19 matches
Mail list logo