I use Spark and spark-cassandra-connector with a customized Cassandra writer (spark-cassandra-connector doesn’t support DELETE). Basically the writer works as follows:
- Bind a row in Spark RDD with either INSERT/Delete PreparedStatement - Create a BatchStatement for multiple rows - Write to Cassandra. I knew using CQLBulkOutputFormat would be better, but it doesn't supports DELETE. On Thu, Sep 24, 2015 at 1:27 PM, Gerard Maas <gerard.m...@gmail.com> wrote: > How are you loading the data? I mean, what insert method are you using? > > On Thu, Sep 24, 2015 at 9:58 PM, Benyi Wang <bewang.t...@gmail.com> wrote: > >> I have a cassandra cluster provides data to a web service. And there is a >> daily batch load writing data into the cluster. >> >> - Without the batch loading, the service’s Latency 99thPercentile is >> 3ms. But during the load, it jumps to 90ms. >> - I checked cassandra keyspace’s ReadLatency.99thPercentile, which >> jumps to 1ms from 600 microsec. >> - The service’s cassandra java driver request 99thPercentile was 90ms >> during the load >> >> The java driver took the most time. I knew the Cassandra servers are busy >> in writing, but I want to know what kinds of metrics can identify where is >> the bottleneck so that I can tune it. >> >> I’m using Cassandra 2.1.8 and Cassandra Java Driver 2.1.5. >> >> > >