Re: DataBlockScanner scan period

2010-11-23 Thread Brian Bockelman
On Nov 23, 2010, at 7:41 PM, Thanh Do wrote: > sorry for digging up this old thread. > > Brian, is this the reason you want to add a "data-level" scan > to HDFS, as in HDFS-221. > > It seems to me that a very rarely read block could > be silently corrupted, because the DataBlockScanner > never

Re: DataBlockScanner scan period

2010-11-23 Thread Thanh Do
sorry for digging up this old thread. Brian, is this the reason you want to add a "data-level" scan to HDFS, as in HDFS-221. It seems to me that a very rarely read block could be silently corrupted, because the DataBlockScanner never finish it scanning job in 3 weeks... On Wed, Oct 13, 2010 at

Re: DataBlockScanner scan period

2010-10-13 Thread Thanh Do
Oh, now i see the problem. The implication here is that some blocks might not be scanned for every long time, because the scanner may not finish scan all the blocks during 3 weeks, then after that, it start over again, ... Interesting, thanks for prompt reply, Brian. Thanh On Wed, Oct 13, 2010

Re: DataBlockScanner scan period

2010-10-13 Thread Brian Bockelman
On Oct 13, 2010, at 7:29 PM, Thanh Do wrote: > Hi Brian, > > If this is the case, then is there any chance that, > some how the DataBlockScanner cannot finishes > the verification for all the block in three weeks > (e.g, a node has a very large number of blocks)? > Yes. At some point, I'd rea

Re: DataBlockScanner scan period

2010-10-13 Thread Thanh Do
Hi Brian, If this is the case, then is there any chance that, some how the DataBlockScanner cannot finishes the verification for all the block in three weeks (e.g, a node has a very large number of blocks)? Thanh On Wed, Oct 13, 2010 at 7:18 PM, Brian Bockelman wrote: > Hi Thanh, > > That is co

Re: DataBlockScanner scan period

2010-10-13 Thread Brian Bockelman
Hi Thanh, That is correct. Last time I read the code, Hadoop scheduled the block verifications randomly throughout the period in order to avoid periodic effects (i.e., high load every N minutes). Brian On Oct 13, 2010, at 7:14 PM, Thanh Do wrote: > Brian, > > When you say *attempt* to compl

Re: DataBlockScanner scan period

2010-10-13 Thread Thanh Do
Brian, When you say *attempt* to complete and *entire* node scan, you mean for example, if a node has 100 block files, it will try to verify all 100 block every 3 weeks? That is in average, a block is scanned every (3 weeks / 100 time interval)? Thanks Thanh On Wed, Oct 13, 2010 at 7:07 PM, Bri

Re: DataBlockScanner scan period

2010-10-13 Thread Brian Bockelman
Hi Thanh, The scan period is the period that hadoop *attempts* to complete an entire node scan. That is, if it's set to 3 weeks, HDFS will try to scan each block once every 3 weeks. Obviously, depending on the bandwidth you have made available to the scanning thread, you can specify impossibl

DataBlockScanner scan period

2010-10-13 Thread Thanh Do
Hi again, Could any body explain to me about the scanning period policy of DataBlockScanner? That is who often it wake up and scan a block file. When looking at the code, I found static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L; // three weeks but definitely it does not wake up and pick a