Re: Hadoop throughput question

2013-01-03 Thread Michael Katzenellenbogen
Loaded question indeed. How are you measuring that 30mb/s? Is that per machine / NIC? HDFS throughout? Some other metric? -Michael On Jan 3, 2013, at 5:01 PM, Artem Ervits are9...@nyp.org wrote: Hello all, I’d like to pick the community brain on average throughput speeds for a moderately

RE: Hadoop throughput question

2013-01-03 Thread Artem Ervits
, January 03, 2013 5:08 PM To: user@hadoop.apache.org Subject: Re: Hadoop throughput question Loaded question indeed. How are you measuring that 30mb/s? Is that per machine / NIC? HDFS throughout? Some other metric? -Michael On Jan 3, 2013, at 5:01 PM, Artem Ervits are9...@nyp.orgmailto:are9

RE: Hadoop throughput question

2013-01-03 Thread Artem Ervits
: Hadoop throughput question Let's suppose you are doing a read-intensive job like, for example, counting records. This is will be disk bandwidth limited. On a 4-node cluster with 2 local SATA on each node you should easily read 400MB/sec in aggregate. When you are running the Hadoop cluster

RE: Hadoop throughput question

2013-01-03 Thread John Lilley
@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: RE: Hadoop throughput question Let's suppose you are doing a read-intensive job like, for example, counting records. This is will be disk bandwidth limited. On a 4-node cluster with 2 local SATA on each node you should easily read 400MB/sec in aggregate

RE: Hadoop throughput question

2013-01-03 Thread Artem Ervits
processing? Thank you. From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Thursday, January 03, 2013 6:09 PM To: user@hadoop.apache.org Subject: RE: Hadoop throughput question Unless the Hadoop processing and the OneFS storage are co-located, MapReduce can't schedule tasks so as to take

Re: Hadoop throughput question

2013-01-03 Thread Aaron Eng
? ** ** Thank you. ** ** *From:* John Lilley [mailto:john.lil...@redpoint.net] *Sent:* Thursday, January 03, 2013 6:09 PM *To:* user@hadoop.apache.org *Subject:* RE: Hadoop throughput question ** ** Unless the Hadoop processing and the OneFS storage are co-located, MapReduce

Re: Hadoop throughput question

2013-01-03 Thread Michael Segel
/sec in aggregate is far worse than what I’d expect Insilon to deliver, even over a single 1GB connection. john From: Artem Ervits [mailto:are9...@nyp.org] Sent: Thursday, January 03, 2013 4:02 PM To: user@hadoop.apache.org Subject: RE: Hadoop throughput question Hadoop is using

Re: Hadoop throughput question

2013-01-03 Thread Michael Katzenellenbogen
...@redpoint.netjohn.lil...@redpoint.net] *Sent:* Thursday, January 03, 2013 6:09 PM *To:* user@hadoop.apache.org *Subject:* RE: Hadoop throughput question Unless the Hadoop processing and the OneFS storage are co-located, MapReduce can’t schedule tasks so as to take advantage of data locality. You

RE: Hadoop throughput question

2013-01-03 Thread Artem Ervits
Setting the property to 64k made the throughput jump to 36mb/sec, 39mb for 128k. Thank you for the tip. From: Michael Katzenellenbogen [mailto:mich...@cloudera.com] Sent: Thursday, January 03, 2013 7:28 PM To: user@hadoop.apache.org Subject: Re: Hadoop throughput question What is the value

RE: Hadoop throughput question

2013-01-03 Thread John Lilley
Katzenellenbogen [mailto:mich...@cloudera.com] Sent: Thursday, January 03, 2013 7:28 PM To: user@hadoop.apache.org Subject: Re: Hadoop throughput question What is the value of the io.file.buffer.size property? Try tuning it up to 64k or 128k and see if this improves performance when reading