Re: Sizing help

2011-11-11 Thread Todd Lipcon
On Fri, Nov 11, 2011 at 10:15 AM, Matt Foley wrote: > Nope; hot swap :-) AFAIK you can't re-add the marked-dead disk to the DN, can you? But yea, you can hot-swap the disk, then kick the DN process, which should take less than 10 minutes. That means the NN won't ever notice it's down, and you wo

Re: Sizing help

2011-11-11 Thread Matt Foley
Nope; hot swap :-) On Nov 11, 2011, at 9:59 AM, Steve Ed wrote: I understand that with 0.20.204, loss of a disk doesn’t loss the node. But if we have to replace that lost disk, its again scheduling the whole node down, kicking replication *From:* Matt Foley [mailto:mfo...@hortonworks.com] *

RE: Sizing help

2011-11-11 Thread Steve Ed
I understand that with 0.20.204, loss of a disk doesn't loss the node. But if we have to replace that lost disk, its again scheduling the whole node down, kicking replication From: Matt Foley [mailto:mfo...@hortonworks.com] Sent: Friday, November 11, 2011 1:58 AM To: hdfs-user@hadoop.apache.o

Re: Sizing help

2011-11-11 Thread Ted Dunning
Matt, Thanks for pointing that out. I was talking about machine chassis failure since it is the more serious case, but should have pointed out that losing single disks is subject to the same logic with smaller amounts of data. If, however, an installation uses RAID-0 for higher read speed then a

Re: Sizing help

2011-11-11 Thread Koji Noguchi
Another factor to consider, when disk is bad you may have corrupted blocks which may only get detected by the periodic DataBlockScanner check. I believe each datanode tries to finish the entire scan in dfs.datanode.scan.period.hours (3weeks default) period. So with 2x replication and some undetec

Re: structured data split

2011-11-11 Thread Bejoy KS
Thanks Harsh !... 2011/11/11 Harsh J > Sorry Bejoy, I'd typed that URL out from what I remembered on my mind. > Fixed link is: http://wiki.apache.org/hadoop/HadoopMapReduce > > 2011/11/11 Bejoy KS : > > Thanks Harsh for correcting me with that wonderful piece of information . > > Cleared a wrong

Re: Using HDFS to store few MB files for file sharing purposes

2011-11-11 Thread Ivan Kelly
As Todd said, HDFS isn't suited to this. You could take a look at Gluster though. It seems like it would fit your needs better. -Ivan

Re: structured data split

2011-11-11 Thread Harsh J
Sorry Bejoy, I'd typed that URL out from what I remembered on my mind. Fixed link is: http://wiki.apache.org/hadoop/HadoopMapReduce 2011/11/11 Bejoy KS : > Thanks Harsh for correcting me with that wonderful piece of information . > Cleared a wrong assumption on hdfs storage fundamentals today. > >

Re: structured data split

2011-11-11 Thread 臧冬松
Thanks Bejoy, that help a lot! 2011/11/11, Bejoy KS : > Hi Donal > I don't have much of an expose to the domain which you are > pointing on to, but from a plain map reduce developer terms there would be > my way of looking into processing such data format with map reduce > - If the data i

Re: structured data split

2011-11-11 Thread Bejoy KS
Hi Donal I don't have much of an expose to the domain which you are pointing on to, but from a plain map reduce developer terms there would be my way of looking into processing such data format with map reduce - If the data is kind of flowing in continuously then I'd use flume to collect t

Re: structured data split

2011-11-11 Thread Bejoy KS
Thanks Harsh for correcting me with that wonderful piece of information . Cleared a wrong assumption on hdfs storage fundamentals today. Sorry Donal for confusing you over the same. Harsh, Looks like the link is broken, it'd be great if you could post the url once more. Thanks a lot Rega

Re: structured data split

2011-11-11 Thread Charles Earl
Hi, Please also feel free to contact me. I'm working with STAR project at Brookhaven Lab, and we are trying to build a MR workflow for analysis of particle data. I've done some preliminary experiments running Root and other nuclear physics analysis software in MR and have been looking at various

Re: Could not obtain block

2011-11-11 Thread Denny Ye
hi Steve, What's your HDFS release version? From the error log and HDFS0.21 code, I guess that the file does not have any replicas. You may focus on the missing replica of this file. Pay attention to the NameNode log with that block id and track the replica distribution. Or check the Na

Re: structured data split

2011-11-11 Thread Will Maier
Hi Donal- On Fri, Nov 11, 2011 at 10:12:44PM +0800, ?? wrote: > My scenario is that I have lots of files from High Energy Physics experiment. > These files are in binary format,about 2G each, but basically they are > composed by lots of "Event", each Event is independent with others. The > phy

Re: structured data split

2011-11-11 Thread 臧冬松
Hi Bejoy, I don't understand why it's impossible to have half of a line in one block, since the file is split into fixed size of blocks. My scenario is that I have lots of files from High Energy Physics experiment. These files are in binary format,about 2G each, but basically they are composed by

Re: structured data split

2011-11-11 Thread Harsh J
Bejoy, This is incorrect. As Denny had explained earlier, blocks are split along byte sizes alone. The writer does not concern itself with newlines and such. When reading, the record readers align themselves to read till the end of lines by communicating with the next block if they have to. Th

Re: structured data split

2011-11-11 Thread bejoy . hadoop
Donal In hadoop that hardly happens so. When you are storing data in hdfs it would be split line to blocks depending on end of lines, in case of normal files. It won't be like you'd be having half of a line in one block and the rest in next one. You don't need to worry on that fact.

Re: structured data split

2011-11-11 Thread 臧冬松
Thanks Bejoy! It's better to process the data blocks locally and separately. I just want to know how to deal with a structure (i.e. a word,a line) that is split into two blocks. Cheers, Donal 在 2011年11月11日 下午7:01,Bejoy KS 写道: > Hi Donal > You can configure your map tasks the way you like t

Re: structured data split

2011-11-11 Thread Bejoy KS
Hi Donal You can configure your map tasks the way you like to process your input. If you have file of size 100 mb, it would be divided into two input blocks and stored in hdfs ( if your dfs.block.size is default 64 Mb). It is your choice on how you process the same using map reduce - With th

Re: structured data split

2011-11-11 Thread 臧冬松
Thanks Denny! So that means each map task will have to read from another DataNode inorder to read the end line of the previous block? Cheers, Donal 2011/11/11 Denny Ye > hi >Structured data is always being split into different blocks, likes a > word or line. >MapReduce task read HDFS da

Re: Sizing help

2011-11-11 Thread Matt Foley
I agree with Ted's argument that 3x replication is way better than 2x. But I do have to point out that, since 0.20.204, the loss of a disk no longer causes the loss of a whole node (thankfully!) unless it's the system disk. So in the example given, if you estimate a disk failure every 2 hours, ea

Re: structured data split

2011-11-11 Thread Denny Ye
hi Structured data is always being split into different blocks, likes a word or line. MapReduce task read HDFS data with the unit - *line* - it will read the whole line from the end of previous block to start of subsequent to obtains that part of line record. So you does not worry about the I