> EC2 m1.large node
You will have a much happier time if you use a m1.xlarge. 

> We set MAX_HEAP_SIZE="6G" and HEAP_NEWSIZE="400M"  
Thats a pretty low new heap size.

> checks for new entries (in "Entries" CF, with indexed column status=1), 
> processes them, and sets the status to 2, when done
This is not the best data model. 
You may be better have one CF for the unprocessed and one for the process. 
Or if you really need a queue using something like Kafka. 

> I will appreciate any advice on how to speed the writes up,
Writes are instantly available for reading. 
The first thing I would do is see where the delay is. Use the nodetool cfstats 
to see the local write latency, or track the write latency from the client 
perspective. 

If you are looking for near real time / continuous computation style processing 
take a look at http://storm-project.net/ and register for this talk from a 
Brian O'Neill one of my fellow Data Stax MVP's 
http://learn.datastax.com/WebinarCEPDistributedProcessingonCassandrawithStorm_Registration.html

Cheers
  
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/01/2013, at 5:48 AM, Vitaly Sourikov <vitaly.souri...@gmail.com> wrote:

> Hi,
> we are currently at an early stage of our project and have only one Cassandra 
> 1.1.7 node hosted on EC2 m1.large node, where the data is written to the 
> ephemeral disk, and /var/lib/cassandra/data is just a soft link to it. Commit 
> logs and caches are still on /var/lib/cassandra/. We set MAX_HEAP_SIZE="6G" 
> and HEAP_NEWSIZE="400M"  
> 
> On the client-side, we use Astyanax 1.56.18 to access the data.  We have a 
> processing server that writes to Cassandra, and an online server that reads 
> from it. The former wakes up every 0.5-5sec., checks for new entries (in 
> "Entries" CF, with indexed column status=1), processes them, and sets the 
> status to 2, when done. The online server checks once a second if an entry 
> that should be processed got the status 2 and sends it to its client side for 
> display. Processing takes 5-10 seconds and updates various columns in the 
> "Entries" CF few times on the way. One of these columns may contain ~12KB of 
> textual data, others are just short strings or numbers.
> 
> Now, our problem is that it takes 20-40 seconds before the online server 
> actually sees the change - and it is way too long, this process is supposed 
> to be nearly real-time. Moreover, in sqlsh, if I perform a similar update, it 
> is immediately seen in the following select results, but the updates from the 
> back-end server also do not appear for 20-40 seconds. 
> 
> I tried switching the row caches for that table and in yaml on and of. I 
> tried commitlog_sync: batch with commitlog_sync_batch_window_in_ms: 50. 
> Nothing helped. 
> 
> I will appreciate any advice on how to speed the writes up, or at least an 
> explanation why this happens.
> 
> thanks,
> Vitaly

Reply via email to