Datanode's BlockScanner periodically verifies checksum of all the blocks on
each datanode.
Koji
On Wed, Oct 21, 2020 at 10:26 AM संजीव (Sanjeev Tripurari) <
sanjeevtripur...@gmail.com> wrote:
> Hi Tom
>
> Therefore, if I write a file to HDFS but access it two years later, then
> the checksum wil
> Else, I will go for a customed script to delete all directories (and content)
> older than 2 or 3 days…
>
TaskTracker (or NodeManager in 2.*) keeps the list of dist cache entries in
memory.
So if external process (like your script) start deleting dist cache files,
there would be inconsistency
Create a dump.sh on hdfs.
$ hadoop dfs -cat /user/knoguchi/dump.sh
#!/bin/sh
hadoop dfs -put myheapdump.hprof /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
Run your job with
-Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xm