I use Spark and spark-cassandra-connector with a customized Cassandra
writer (spark-cassandra-connector doesn’t support DELETE). Basically the
writer works as follows:

   - Bind a row in Spark RDD with either INSERT/Delete PreparedStatement
   - Create a BatchStatement for multiple rows
   - Write to Cassandra.

I knew using CQLBulkOutputFormat would be better, but it doesn't supports
DELETE.
​

On Thu, Sep 24, 2015 at 1:27 PM, Gerard Maas <gerard.m...@gmail.com> wrote:

> How are you loading the data? I mean, what insert method are you using?
>
> On Thu, Sep 24, 2015 at 9:58 PM, Benyi Wang <bewang.t...@gmail.com> wrote:
>
>> I have a cassandra cluster provides data to a web service. And there is a
>> daily batch load writing data into the cluster.
>>
>>    - Without the batch loading, the service’s Latency 99thPercentile is
>>    3ms. But during the load, it jumps to 90ms.
>>    - I checked cassandra keyspace’s ReadLatency.99thPercentile, which
>>    jumps to 1ms from 600 microsec.
>>    - The service’s cassandra java driver request 99thPercentile was 90ms
>>    during the load
>>
>> The java driver took the most time. I knew the Cassandra servers are busy
>> in writing, but I want to know what kinds of metrics can identify where is
>> the bottleneck so that I can tune it.
>>
>> I’m using Cassandra 2.1.8 and Cassandra Java Driver 2.1.5.
>> ​
>>
>
>

Reply via email to