Hello,

It looks like the HDFS caching does not work well.
The cached log file is around 200MB. The hadoop cluster has 3 nodes, each
has 4GB memory.

-bash-4.1$ hdfs cacheadmin -addPool wptest1
Successfully added cache pool wptest1.

-bash-4.1$ /hadoop/hadoop-2.3.0/bin/hdfs cacheadmin -listPools
Found 1 result.
NAME     OWNER  GROUP  MODE            LIMIT  MAXTTL
wptest1  hdfs   hdfs   rwxr-xr-x   unlimited   never

-bash-4.1$ hdfs cacheadmin -addDirective -path hadoop003.log -pool wptest1
Added cache directive 1

-bash-4.1$  time /hadoop/hadoop-2.3.0/bin/hadoop fs -tail hadoop003.log
real    0m2.796s
user    0m4.263s
sys     0m0.203s

-bash-4.1$  time /hadoop/hadoop-2.3.0/bin/hadoop fs -tail hadoop003.log
real    0m3.050s
user    0m4.176s
sys     0m0.192s

It is weird that the cache status shows 0 byte cached:-bash-4.1$
/hadoop/hadoop-2.3.0/bin/hdfs cacheadmin -listDirectives -stats -path
hadoop003.log -pool wptest1
Found 1 entry
ID POOL      REPL EXPIRY  PATH                       BYTES_NEEDED
BYTES_CACHED  FILES_NEEDED  FILES_CACHED
  1 wptest1      1 never   /user/hdfs/hadoop003.log
209715206             0             1             0

-bash-4.1$ file /hadoop/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0
/hadoop/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0: ELF 64-bit LSB shared
object, x86-64, version 1 (SYSV), dynamically linked, not stripped

I also tried the word count example with the same file. The execution time
is always 40 seconds. (The map/reduce job without cache is 42 seconds)
Is there anything wrong?
Thanks a lot

Reply via email to