Hello, In spark we can use *newAPIHadoopRDD *to access the different distributed system like HDFS, HBase, and MongoDB via different inputformat. Is it possible to access the *inputsplit *in Spark directly? Spark can cache data in local memory. Perform local computation/aggregation on the local inputsplit could speed up the whole performance.
Thanks a lot