> Could you please share how much data you store on the cluster and what > is HW configuration of the nodes?
These nodes are dedicated HW, 24 cpu and 50Gb ram. Each node has a few TBs of data (you don't want to go over this) in raid50 (we're migrating over to JBOD). Each c* node is running 2.0.11 and configured to use 8gm heap, 2g new, and jdk1.7.0_55. Hadoop (2.2.0) tasktrackers and dfs run on these nodes as well, all up they use up to 12Gb ram, leaving ~30Gb ram for kernel and page cache. Data-locality is an important goal, in the worse case scenarios we've seen it mean a four times throughput benefit. Hdfs being a volatile hadoop-internals space for us is on SSDs, providing strong m/r performance. (commitlog of course is also on SSD – we made the mistake of putting it on the same SSD to begin with. don't do that, commitlog gets its own SSD) > I am really impressed that you are > able to read 100M records in ~4minutes on 4 nodes. It makes something > like 100k reads per node, which is something we are quite far away from. These are not individual reads and not the number of partition keys, but m/r records (or cql rows). But yes, the performance of spark against cassandra is impressive. > It leads me to question, whether reading from Spark goes through > Cassandra's JVM and thus go through normal read path, or if it reads the > sstables directly from disks sequentially and possibly filters out > old/tombstone values by itself? Both Hadoop-Cassandra integration and the Spark-Cassandra connector goes through the normal read path like all cql read queries. With our m/r jobs each task works with just one partition key, doing repeated column slice reads through that partition key according to the ConfigHelper.rangeBatchSize setting, which we have set to 100. These hadoop jobs use a custom written CqlInputFormat due to the poor performance CqlInputFormat has today against a vnodes setup, the customisation we have is pretty much the same as the patch on offer in CASSANDRA-6091. This problem with vnodes we haven't experienced with the spark connector. I presume that, like the hadoop integration, spark also bulk reads (column slices) from each partition key. Otherwise this is useful reading http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting > This is also a cluster that serves requests to web applications that > need low latency. Let it be said this isn't something i'd recommend, just the path we had to take because of our small initial dedicated-HW cluster. (You really want to separate online and offline datacenters, so that you can maximise the offline clusters for the heavy batch reads). ~mck