Nevermind. I found my stupid mistake. I didn’t reset a variable…this fact had escaped me for the past two days.
From: "Avery, John" <jav...@akamai.com> Date: Wednesday, December 27, 2017 at 4:20 PM To: "user@hadoop.apache.org" <user@hadoop.apache.org> Subject: Help me understand hadoop caching behavior I’m writing a program using the C API for Hadoop. I have a 4-node cluster. (Cluster was setup according to https://www.tutorialspoint.com/hadoop/hadoop_multi_node_cluster.htm)<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.tutorialspoint.com_hadoop_hadoop-5Fmulti-5Fnode-5Fcluster.htm-29&d=DwMGaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=KsLfEH77dd7XNQpELxDSeOzoJ6T91vg4841kGFjzcmE&m=2ofwsSjChHB4Crrp1R2L7Q4Q9kCXW6jo8MPO2JroVkw&s=SYiZYvbh_O8ix805bOf0si5jCKLzTfH0QzHQEjcJHow&e=> Of the 4 nodes, one is the namenode and a datanode, the others are datanodes (with one being a secondary namenode). I’ve already managed to write about 1.5TB of data to the cluster. My issue is reading data back, specifically, it’s too fast. *Way* too fast, and I don’t understand how or why. The 1.5 TB is stored in the form of about 20,000 60-80MB files. When I read back the files (7 files in parallel) I get read speeds in excess of 75GB/s. Obviously this is DRAM speed, here’s the problem…each of the 4 nodes only has 32GB of RAM, and I’m asking Hadoop to re-read over 400GB of data. I am using the read back data, so it isn’t the compiler optimizing something out, because when I turn off optimization flags, it still runs 10x faster than the network/disks to this box can run. Specifically: 2x10Gb network ports, bonded. Maximum network input 2.5GB/s. (test verified) 16x 4TB hard drives: 2GB/s maximum throughput (test verified; outside of Hadoop). As for how I’m reading my data, hdfsOpenFile(…,O_RDONLY) and hdfsRead(). So, at best, I should get 4.5GB/s, and that’s in a perfect work world. But during my tests I see no network traffic, and very little (~30-70MB/s) disk IO. Yet it manages to return to me 300GB of unique data (the data is real, not a pattern, not something particularly compressible or dedupable). I’m at a complete loss for how 300GB of data is getting sent to me so quickly?! I feel like I’m overlooking something trivial…I’m specifically asking for 10X the system’s memory (and over 2x the cluster’s memory!) in order to *prevent* caching from polluting my numbers. Yet it’s doing something that should be impossible. I’m at a complete loss. I fully expect to facepalm at the end of this. Oh, and here’s the really weird part (to me). If I request all 20,000 files, it zooms past the 5000 I have cached from my 400MB read test and then slows down to a more realistic 2GB/s for the rest of the files. Until I re-run the program a second time…then it returns a result in something like 35 seconds instead of 5 minutes. !!!