Re: Why single thread for HDFS?

2010-07-07 Thread Steve Loughran
elton sky wrote: Steve, Seems HP has done block based parallel reading from different datanodes. yes; very much like IBM's GPFS, only with JBOD storage and the option of running code near the data when appropriate. Though not from disk level, they achieve 4Gb/s rate with 9 readers

Re: Why single thread for HDFS?

2010-07-07 Thread elton sky
Steve, I do have access to that code if I can get at the right bit of the repository, if you really want me to look at it in detail ask, with the caveats that I'm away for the rest of the month and somewhat busy. Apart from that there's no reason why I shouldn't be able to make the changes to

Re: Why single thread for HDFS?

2010-07-06 Thread Gautam Singaraju
To add to Jay Booth's points, adding multi-threaded capability to HDFS will bring down the performance. Consider a production server where 4-5 jobs are running on a low-end commodity server. Currently, that is 4 threads reading and writing from the hard disk. Making it a multi-threaded read and

RE: Why single thread for HDFS?

2010-07-06 Thread Michael Segel
. If you're reading from the cloud and then writing to the UNIX file system, you want to write the blocks in serial order. (KISS). JMHO -Mike Date: Tue, 6 Jul 2010 00:30:06 -0700 Subject: Re: Why single thread for HDFS? From: gautam.singar...@gmail.com To: general@hadoop.apache.org To add to Jay

Re: Why single thread for HDFS?

2010-07-06 Thread Steve Loughran
Michael Segel wrote: Uhm... That's not really true. It gets a bit more complicated than that. If you're talking about M/R jobs, you don't want to do threads in your map() routine, while this is possible, its going to be really hard to justify the extra parallelism along with the need to wait

Re: Why single thread for HDFS?

2010-07-06 Thread elton sky
Steve, Seems HP has done block based parallel reading from different datanodes. Though not from disk level, they achieve 4Gb/s rate with 9 readers (500Mb/s each). I didn't see anywhere I can download their code to play around, pity~ BTW, can we specify which disk to read from with Java? On Wed,

Re: Why single thread for HDFS?

2010-07-05 Thread Bernd Fondermann
On Mon, Jul 5, 2010 at 07:47, Bardia Afshin brandon...@gmail.com wrote: What's the unsubcribe link? To unsubscribe, send mail to general-unsubscr...@hadoop.apache.org Many Apache MLs have an unsubscribe footer. Anyone volunteering to make this happen for this list, too? Bernd

Re: Why single thread for HDFS?

2010-07-05 Thread elton sky
: Friday, July 02, 2010 2:56 AM To: general@hadoop.apache.org Subject: Re: Why single thread for HDFS? Hi, Can you please post this on hdfs-...@hadoop.apache.org ? I suspect the most qualified people to answer this question would all be on that list. Hemanth On Fri, Jul 2

Re: Why single thread for HDFS?

2010-07-05 Thread Todd Lipcon
splitter to increase this and then get more parallelism. HTH -Mike -Original Message- From: Hemanth Yamijala [mailto:yhema...@gmail.com] Sent: Friday, July 02, 2010 2:56 AM To: general@hadoop.apache.org Subject: Re: Why single thread for HDFS? Hi

Re: Why single thread for HDFS?

2010-07-05 Thread elton sky
To: general@hadoop.apache.org Subject: Re: Why single thread for HDFS? Hi, Can you please post this on hdfs-...@hadoop.apache.org ? I suspect the most qualified people to answer this question would all be on that list. Hemanth On Fri, Jul 2, 2010 at 11:43

Re: Why single thread for HDFS?

2010-07-05 Thread Allen Wittenauer
On Jul 5, 2010, at 5:01 PM, elton sky wrote: Well, this sounds good when you have many small files, you concat() them into a big one. I am talking about split a big file into blocks and copy all a few blocks in parallel. Basically, your point is that hadoop dfs -cp is relatively slow and

Re: Why single thread for HDFS?

2010-07-05 Thread elton sky
Basically, your point is that hadoop dfs -cp is relatively slow and could be made faster. If HDFS had a more multi-threaded design, itwould make cp operations faster. What I mean is, if we have the size of a file we can parallel by calculating blocks. Otherwise we couldn't. On Tue, Jul 6, 2010

Re: Why single thread for HDFS?

2010-07-04 Thread Bardia Afshin
one map/reduce job per block. You can write your own splitter to increase this and then get more parallelism. HTH -Mike -Original Message- From: Hemanth Yamijala [mailto:yhema...@gmail.com] Sent: Friday, July 02, 2010 2:56 AM To: general@hadoop.apache.org Subject: Re: Why single

Why single thread for HDFS?

2010-07-02 Thread elton sky
I guess this question was igored, so I just post it again. From my understanding, HDFS uses a single thread to do read and write. Since a file is composed of many blocks and each block is stored as a file in the underlying FS, we can do some parallelism on block base. When read across

RE: Why single thread for HDFS?

2010-07-02 Thread Segel, Mike
parallelism. HTH -Mike -Original Message- From: Hemanth Yamijala [mailto:yhema...@gmail.com] Sent: Friday, July 02, 2010 2:56 AM To: general@hadoop.apache.org Subject: Re: Why single thread for HDFS? Hi, Can you please post this on hdfs-...@hadoop.apache.org ? I suspect the most

Re: Why single thread for HDFS?

2010-07-02 Thread Jay Booth
-Mike -Original Message- From: Hemanth Yamijala [mailto:yhema...@gmail.com] Sent: Friday, July 02, 2010 2:56 AM To: general@hadoop.apache.org Subject: Re: Why single thread for HDFS? Hi, Can you please post this on hdfs-...@hadoop.apache.org ? I suspect the most qualified