hi, folks, I have a quick question about how hdfs handle cache? In this lab experiment, I have a 4 node hadoop cluster (2.x) and each node has a fair large memory (96GB). And have a single hdfs file with 256MB, and also fit in one HDFS block. The local filesystem is linux.
Now from one of the DataNode, I started 10 hadoop client processes to repeatedly read the above file. With the assumption that HDFS will cache the 256MB in memory, so (after the 1st read) READs will have no disk I/O involved anymore. My question is : *how many COPY of the 256MB will be in memory of this DataNode? 10 or 1?* How about the 10 client processes are located at the 5th linux box independent of the cluster? Will we have 10 copies of the 256 MB or just 1? Many thanks. Appreciate your help on this. Demai