Basically I want to list all the files in a .har file and compare the file list/sizes to an existing directory in HDFS. The problem is that running commands like: hdfs dfs -ls -R <path to har file> is orders of magnitude slower then running the same command against a live HDFS file system.
How much slower? I've calculated it will take ~19 days to list all the files in 250TB worth of content spread between 2 .har files. Is this normal? Can I do this faster (write a map/reduce job/etc?) -- Aaron Turner https://synfin.net/ Twitter: @synfinatic Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org