elton sky wrote:
Steve,
Seems HP has done block based parallel reading from different datanodes.
yes; very much like IBM's GPFS, only with JBOD storage and the option of
running code near the data when appropriate.
Though not from disk level, they achieve 4Gb/s rate with 9 readers
Steve,
I do have access to that code if I can get at the right bit of the
repository, if you really want me to look at it in detail ask, with the
caveats that I'm away for the rest of the month and somewhat busy. Apart
from that there's no reason why I shouldn't be able to make the changes to
To add to Jay Booth's points, adding multi-threaded capability to HDFS
will bring down the performance. Consider a production server where
4-5 jobs are running on a low-end commodity server. Currently, that is
4 threads reading and writing from the hard disk. Making it a
multi-threaded read and
.
If you're reading from the cloud and then writing to the UNIX file system, you
want to write the blocks in serial order. (KISS).
JMHO
-Mike
Date: Tue, 6 Jul 2010 00:30:06 -0700
Subject: Re: Why single thread for HDFS?
From: gautam.singar...@gmail.com
To: general@hadoop.apache.org
To add to Jay
Michael Segel wrote:
Uhm...
That's not really true. It gets a bit more complicated than that.
If you're talking about M/R jobs, you don't want to do threads in your map() routine, while this is possible, its going to be really hard to justify the extra parallelism along with the need to wait
Steve,
Seems HP has done block based parallel reading from different datanodes.
Though not from disk level, they achieve 4Gb/s rate with 9 readers (500Mb/s
each).
I didn't see anywhere I can download their code to play around, pity~
BTW, can we specify which disk to read from with Java?
On Wed,
On Mon, Jul 5, 2010 at 07:47, Bardia Afshin brandon...@gmail.com wrote:
What's the unsubcribe link?
To unsubscribe, send mail to
general-unsubscr...@hadoop.apache.org
Many Apache MLs have an unsubscribe footer.
Anyone volunteering to make this happen for this list, too?
Bernd
: Friday, July 02, 2010 2:56 AM
To: general@hadoop.apache.org
Subject: Re: Why single thread for HDFS?
Hi,
Can you please post this on hdfs-...@hadoop.apache.org ? I suspect the
most qualified people to answer this question would all be on that
list.
Hemanth
On Fri, Jul 2
splitter to increase this and then get more parallelism.
HTH
-Mike
-Original Message-
From: Hemanth Yamijala [mailto:yhema...@gmail.com]
Sent: Friday, July 02, 2010 2:56 AM
To: general@hadoop.apache.org
Subject: Re: Why single thread for HDFS?
Hi
To: general@hadoop.apache.org
Subject: Re: Why single thread for HDFS?
Hi,
Can you please post this on hdfs-...@hadoop.apache.org ? I suspect
the
most qualified people to answer this question would all be on that
list.
Hemanth
On Fri, Jul 2, 2010 at 11:43
On Jul 5, 2010, at 5:01 PM, elton sky wrote:
Well, this sounds good when you have many small files, you concat() them
into a big one. I am talking about split a big file into blocks and copy all
a few blocks in parallel.
Basically, your point is that hadoop dfs -cp is relatively slow and
Basically, your point is that hadoop dfs -cp is relatively slow and could
be made faster. If HDFS had a more multi-threaded design, itwould make cp
operations faster.
What I mean is, if we have the size of a file we can parallel by calculating
blocks. Otherwise we couldn't.
On Tue, Jul 6, 2010
one map/reduce job per
block. You can write your own splitter to increase this and then
get more parallelism.
HTH
-Mike
-Original Message-
From: Hemanth Yamijala [mailto:yhema...@gmail.com]
Sent: Friday, July 02, 2010 2:56 AM
To: general@hadoop.apache.org
Subject: Re: Why single
I guess this question was igored, so I just post it again.
From my understanding, HDFS uses a single thread to do read and write.
Since a file is composed of many blocks and each block is stored as a file
in the underlying FS, we can do some parallelism on block base.
When read across
parallelism.
HTH
-Mike
-Original Message-
From: Hemanth Yamijala [mailto:yhema...@gmail.com]
Sent: Friday, July 02, 2010 2:56 AM
To: general@hadoop.apache.org
Subject: Re: Why single thread for HDFS?
Hi,
Can you please post this on hdfs-...@hadoop.apache.org ? I suspect the
most
-Mike
-Original Message-
From: Hemanth Yamijala [mailto:yhema...@gmail.com]
Sent: Friday, July 02, 2010 2:56 AM
To: general@hadoop.apache.org
Subject: Re: Why single thread for HDFS?
Hi,
Can you please post this on hdfs-...@hadoop.apache.org ? I suspect the
most qualified
16 matches
Mail list logo