Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Sanjeev Tripurari Wed, 21 Oct 2020 07:26:39 -0700

Hi Tom

Therefore, if I write a file to HDFS but access it two years later, then
the checksum will be computed only twice, at the beginning of the two years
and again at the end when a client connects?  Correct?  As long as no
process ever accesses the file between now and two years from now, the
checksum is never redone and compared to the two year old checksum in the
fsimage?


yes, Exactly unless data is read checksum is not verified. (when data is
written and when the data is read),
if checksum is mismatched, there is no way to correct it, you will have to
re-write that file.

When  datanode is added back in, there is no real read operation on the
files themselves.  The datanode just reports the blocks but doesn't really
read the blocks that are there to re-verify the files and ensure
consistency?

yes, Exactly, datanode maintains list of files and their blocks, which it
reports, along with total disk size and used size.
Namenode only has list of blocks, unless datanodes is connected it wont
know where the blocks are stored.

Regards
-Sanjeev


On Wed, 21 Oct 2020 at 18:31, TomK <tomk...@mdevsys.com> wrote:

> Hey Sanjeev,
>
> Thank you very much again.  This confirms my suspision.
>
> Therefore, if I write a file to HDFS but access it two years later, then
> the checksum will be computed only twice, at the beginning of the two years
> and again at the end when a client connects?  Correct?  As long as no
> process ever accesses the file between now and two years from now, the
> checksum is never redone and compared to the two year old checksum in the
> fsimage?
>
> When  datanode is added back in, there is no real read operation on the
> files themselves.  The datanode just reports the blocks but doesn't really
> read the blocks that are there to re-verify the files and ensure
> consistency?
>
> Thx,
> TK
>
>
>
> On 10/21/2020 12:38 AM, संजीव (Sanjeev Tripurari) wrote:
>
> Hi Tom,
>
> Every datanode sends heartbeat to namenode, on its list of blocks it has.
>
> When a datanode which is disconnected for a while, after connecting will
> send heartbeat to namenode, with list of blocks it has (till then namenode
> will have under-replicated blocks).
> As soon as the datanode is connected to namenode, it will clear
> under-replicatred blocks.
>
> *When a client connects to read or write a file, it will run checksum to
> validate the file.*
>
> There is no independent process running to do checksum, as it will be
> heavy process on each node.
>
> Regards
> -Sanjeev
>
> On Wed, 21 Oct 2020 at 00:18, Tom <t...@mdevsys.com> wrote:
>
>> Thank you.  That part I understand and am Ok with it.
>>
>> What I would like to know next is when again the CRC32C checksum is ran
>> and checked against the fsimage that the block file has not changed or
>> become corrupted?
>>
>> For example, if I take a datanode out, and within 15 minutes, plug it
>> back in, does HDF rerun the CRC 32C on all data disks on that node to make
>> sure blocks are ok?
>>
>> Cheers,
>> TK
>>
>> Sent from my iPhone
>>
>> On Oct 20, 2020, at 1:39 PM, संजीव (Sanjeev Tripurari) <
>> sanjeevtripur...@gmail.com> wrote:
>>
>> its done as sson as  a file is stored on disk..
>>
>> Sanjeev
>>
>> On Tuesday, 20 October 2020, TomK <tomk...@mdevsys.com> wrote:
>>
>>> Thanks again.
>>>
>>> At what points is the checksum validated (checked) after that?  For
>>> example, is it done on a daily basis or is it done only when the file is
>>> accessed?
>>>
>>> Thx,
>>> TK
>>>
>>> On 10/20/2020 10:18 AM, संजीव (Sanjeev Tripurari) wrote:
>>>
>>> As soon as the file is written first time checksum is calculated and
>>> updated in fsimage (first in edit logs), and same is replicated other
>>> replicas.
>>>
>>>
>>>
>>> On Tue, 20 Oct 2020 at 19:15, TomK <tomk...@mdevsys.com> wrote:
>>>
>>>> Hi Sanjeev,
>>>>
>>>> Thank you.  It does help.
>>>>
>>>> At what points is the checksum calculated?
>>>>
>>>> Thx,
>>>> TK
>>>>
>>>> On 10/20/2020 3:03 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>
>>>> For Missing blocks and corrupted blocks, do check if all the datanode
>>>> services are up, non of the disks where hdfs data is stored is accessible
>>>> and have no issues, hosts are reachable from namenode,
>>>>
>>>> If you are able to re-generate the data and write its great, otherwise
>>>> hadoop cannot correct itself.
>>>>
>>>> Could you please elaborate on this?  Does it mean I have to
>>>> continuously access a file for HDFS to be able to detect corrupt blocks and
>>>> correct itself?
>>>>
>>>>
>>>>
>>>>
>>>> *"Does HDFS check that the data node is up, data disk is mounted, path
>>>> to the file exists and file can be read?"*
>>>> -- yes, only after it fails it will say missing blocks.
>>>>
>>>>
>>>> *Or does it also do a filesystem check on that data disk as well as
>>>> perhaps a checksum to ensure block integrity?*
>>>> -- yes, every file cheksum is maintained and cross checked, if it fails
>>>> it will say corrupted blocks.
>>>>
>>>> hope this helps.
>>>>
>>>> -Sanjeev
>>>>
>>>>
>>>> On Tue, 20 Oct 2020 at 09:52, TomK <tomk...@mdevsys.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> HDFS Missing Blocks / Corrupt Blocks Logic:  What are the specific
>>>>> checks done to determine a block is bad and needs to be replicated?
>>>>>
>>>>> Does HDFS check that the data node is up, data disk is mounted, path
>>>>> to
>>>>> the file exists and file can be read?
>>>>>
>>>>> Or does it also do a filesystem check on that data disk as well as
>>>>> perhaps a checksum to ensure block integrity?
>>>>>
>>>>> I've googled on this quite a bit.  I don't see the exact answer I'm
>>>>> looking for.  I would like to know exactly what happens during file
>>>>> integrity verification that then constitutes missing blocks or corrupt
>>>>> blocks in the reports.
>>>>>
>>>>> --
>>>>> Thank  You,
>>>>> TK.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
>>>>> For additional commands, e-mail: user-h...@hadoop.apache.org
>>>>>
>>>>>
>>>>
>>> --
>>> Thx,
>>> TK.
>>>
>>
> --
> Thx,
> TK.
>

Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Reply via email to