Hi, We are running Spark jobs on an Alluxio Cluster which is serving 13 gigabytes of data with 99% of the data is in memory. I was hoping to speed up the Spark jobs by reading the in-memory data in Alluxio, but found Alluxio local hit rate is only 1.68%, while Alluxio remote hit rate is 98.32%. By monitoring the network IO across all worker nodes through "dstat" command, I found that only two nodes had about 1GB of recv or send in the whole precessand, and it is sending 1GB or receiving 1GB during Spark Shuffle Stage. Is there any metrics I could check or configuration to tune ?
Best, Jerry