Re: Comparing CheckSum of Local and HDFS File

Shashi Vishwakarma Sun, 09 Aug 2015 07:35:39 -0700

Hi Gera,

Thanks for your input. I have fairly large amount of data and if I go by
-cat option followed by md5sum calculation then it will become time
consuming process.


I could understand from the code that hadoop checksum is nothing but MD5 of
MD5 of CRC32C and then returning output.I would be more curious to know if
in case I  have to create checksum manually that hadoop is doing
internally, then how do I do that?

Is there any document or link available which can explain that how this
checksum calculation works behind the scene?

Thanks
Shashi

On Sat, Aug 8, 2015 at 8:00 AM, Gera Shegalov <g...@apache.org> wrote:

> The fs checksum output has more info like bytes per CRC, CRC per block.
> See e.g.:
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/MD5MD5CRC32FileChecksum.java
>
> In order to avoid dealing with different formatting or byte order you
> could use md5sum for the remote file as well if the file is reasonably small
>
> hadoop fs -cat /abc.txt | md5sum
>
> On Fri, Aug 7, 2015 at 3:35 AM Shashi Vishwakarma <
> shashi.vish...@gmail.com> wrote:
>
>> Hi
>>
>> I have a small confusion regarding checksum verification.Lets say , i
>> have a file abc.txt and I transferred this file to hdfs. How do I ensure
>> about data integrity?
>>
>> I followed below steps to check that file is correctly transferred.
>>
>> *On Local File System:*
>>
>> md5sum abc.txt
>>
>> 276fb620d097728ba1983928935d6121  TestFile
>>
>> *On Hadoop Cluster :*
>>
>>  hadoop fs -checksum /abc.txt
>>
>> /abc.txt      MD5-of-0MD5-of-512CRC32C
>>  000002000000000000000000911156a9cf0d906c56db7c8141320df0
>>
>> Both output looks different to me. Let me know if I am doing anything
>> wrong.
>>
>> How do I verify if my file is transferred properly into HDFS?
>>
>> Thanks
>> Shashi
>>
>

Re: Comparing CheckSum of Local and HDFS File

Reply via email to