On Fri, Mar 7, 2014 at 7:37 AM, hwpstorage <hwpstor...@gmail.com> wrote: > Hello, > > It looks like the HDFS caching does not work well. > The cached log file is around 200MB. The hadoop cluster has 3 nodes, each > has 4GB memory. > > -bash-4.1$ hdfs cacheadmin -addPool wptest1 > Successfully added cache pool wptest1. > > -bash-4.1$ /hadoop/hadoop-2.3.0/bin/hdfs cacheadmin -listPools > Found 1 result. > NAME OWNER GROUP MODE LIMIT MAXTTL > wptest1 hdfs hdfs rwxr-xr-x unlimited never > > -bash-4.1$ hdfs cacheadmin -addDirective -path hadoop003.log -pool wptest1 > Added cache directive 1 > > -bash-4.1$ time /hadoop/hadoop-2.3.0/bin/hadoop fs -tail hadoop003.log > real 0m2.796s > user 0m4.263s > sys 0m0.203s > > -bash-4.1$ time /hadoop/hadoop-2.3.0/bin/hadoop fs -tail hadoop003.log > real 0m3.050s > user 0m4.176s > sys 0m0.192s > > It is weird that the cache status shows 0 byte cached:-bash-4.1$ > /hadoop/hadoop-2.3.0/bin/hdfs cacheadmin -listDirectives -stats -path > hadoop003.log -pool wptest1 > Found 1 entry > ID POOL REPL EXPIRY PATH BYTES_NEEDED > BYTES_CACHED FILES_NEEDED FILES_CACHED > 1 wptest1 1 never /user/hdfs/hadoop003.log 209715206 > 0 1 0
If you take a look at this output, you can see that nothing is actually cached. One way to figure out why this is is to look at the logs of the NameNode and DataNode. Some of the relevant logs are at DEBUG or TRACE level, so you may need to turn up the logs. The CacheReplicationMonitor and FsDatasetCache classes are good places to start. Also be sure to check that you have set dfs.datanode.max.locked.memory. As Andrew commented, "hadoop tail" is not a good command to use for measuring performance, since you have a few seconds of Java startup time, followed by any HDFS setup time, followed by reading a single kilobyte of data. If you want to use the shell, the simplest thing to do is to use cat and read a large file, so that those startup costs don't dominate the measurement. best, Colin > > -bash-4.1$ file /hadoop/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0 > /hadoop/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0: ELF 64-bit LSB shared > object, x86-64, version 1 (SYSV), dynamically linked, not stripped > > I also tried the word count example with the same file. The execution time > is always 40 seconds. (The map/reduce job without cache is 42 seconds) > Is there anything wrong? > Thanks a lot