hi
Structured data is always being split into different blocks, likes a
word or line.
MapReduce task read HDFS data with the unit - *line* - it will read the
whole line from the end of previous block to start of subsequent to obtains
that part of line record. So you does not worry about the
I agree with Ted's argument that 3x replication is way better than 2x. But
I do have to point out that, since 0.20.204, the loss of a disk no longer
causes the loss of a whole node (thankfully!) unless it's the system disk.
So in the example given, if you estimate a disk failure every 2 hours,
Thanks Denny!
So that means each map task will have to read from another DataNode inorder
to read the end line of the previous block?
Cheers,
Donal
2011/11/11 Denny Ye denny...@gmail.com
hi
Structured data is always being split into different blocks, likes a
word or line.
MapReduce
Hi Bejoy,
I don't understand why it's impossible to have half of a line in one block,
since the file is split into fixed size of blocks.
My scenario is that I have lots of files from High Energy Physics
experiment.
These files are in binary format,about 2G each, but basically they are
composed by
hi Steve,
What's your HDFS release version? From the error log and HDFS0.21
code, I guess that the file does not have any replicas. You may focus on
the missing replica of this file.
Pay attention to the NameNode log with that block id and track the
replica distribution. Or check the
Hi,
Please also feel free to contact me. I'm working with STAR project at
Brookhaven Lab, and we are trying to build a MR workflow for analysis of
particle data. I've done some preliminary experiments running Root and other
nuclear physics analysis software in MR and have been looking at
Hi Donal
I don't have much of an expose to the domain which you are
pointing on to, but from a plain map reduce developer terms there would be
my way of looking into processing such data format with map reduce
- If the data is kind of flowing in continuously then I'd use flume to
collect
Thanks Bejoy, that help a lot!
2011/11/11, Bejoy KS bejoy.had...@gmail.com:
Hi Donal
I don't have much of an expose to the domain which you are
pointing on to, but from a plain map reduce developer terms there would be
my way of looking into processing such data format with map
Sorry Bejoy, I'd typed that URL out from what I remembered on my mind.
Fixed link is: http://wiki.apache.org/hadoop/HadoopMapReduce
2011/11/11 Bejoy KS bejoy.had...@gmail.com:
Thanks Harsh for correcting me with that wonderful piece of information .
Cleared a wrong assumption on hdfs storage
As Todd said, HDFS isn't suited to this. You could take a look at Gluster
though. It seems like it would fit your needs better.
-Ivan
Thanks Harsh !...
2011/11/11 Harsh J ha...@cloudera.com
Sorry Bejoy, I'd typed that URL out from what I remembered on my mind.
Fixed link is: http://wiki.apache.org/hadoop/HadoopMapReduce
2011/11/11 Bejoy KS bejoy.had...@gmail.com:
Thanks Harsh for correcting me with that wonderful piece
Matt,
Thanks for pointing that out. I was talking about machine chassis failure
since it is the more serious case, but should have pointed out that losing
single disks is subject to the same logic with smaller amounts of data.
If, however, an installation uses RAID-0 for higher read speed then
I understand that with 0.20.204, loss of a disk doesn't loss the node. But
if we have to replace that lost disk, its again scheduling the whole node
down, kicking replication
From: Matt Foley [mailto:mfo...@hortonworks.com]
Sent: Friday, November 11, 2011 1:58 AM
To:
Nope; hot swap :-)
On Nov 11, 2011, at 9:59 AM, Steve Ed sediso...@gmail.com wrote:
I understand that with 0.20.204, loss of a disk doesn’t loss the node.
But if we have to replace that lost disk, its again scheduling the whole
node down, kicking replication
*From:* Matt Foley
On Fri, Nov 11, 2011 at 10:15 AM, Matt Foley mfo...@hortonworks.com wrote:
Nope; hot swap :-)
AFAIK you can't re-add the marked-dead disk to the DN, can you?
But yea, you can hot-swap the disk, then kick the DN process, which
should take less than 10 minutes. That means the NN won't ever notice
15 matches
Mail list logo