Hi, perhaps I totally misunderstood your problem, but why "bother" with cassandra for storing in the first place?
If your MR for hadoop is only run once for each file (as you wrote above), why not copy the data directly to hdfs, run your MR job and use cassandra as sink? As hdfs and yarn are more or less completely independent you could perhaps use the "master" as ResourceManager (yarn) AND NameNode and DataNode (hdfs) and launch your MR job directly and as mentioned use Cassandra as sink for the reduced data. By this you won't need dedicated hardware, as you only need the hdfs once, process and delete the files afterwards. Best wishes, Wilm