Yes, I'm concerned about the latency. Throughput can be high even when using Python: http://datastax.github.io/python-driver/performance.html. But in my scenarios I need to run queries sequentially, so latencies matter. And Cassandra requires issuing more queries than SQL databases so these latencies can add up to a significant amount.

I was running Asyncore event loop, because it looks like libev isn't supported for PyPy which I'm using. I've switched to CPython and LibevConnection for a moment and I don't think I've noticed a major speedup, and a minimum latency is still 1ms.

Overall, it looks to me that the issue is not that important, because using multi-master, multi-dc databases always involve getting higher and somewhat unpredictable latencies, so relying on sub-millisecond latencies on production clusters is not very realistic.


On 03/27/2015 04:28 PM, Tyler Hobbs wrote:
Just to check, are you concerned about minimizing that latency or
maximizing throughput?

I'll that latency is what you're actually concerned about.  A fair
amount of that latency is probably happening in the python driver.
Although it can easily execute ~8k operations per second (using
cpython), in some scenarios it can be difficult to guarantee sub-ms
latency for an individual query due to how some of the internals work.
In particular, it uses python's Conditions for cross-thread signalling
(from the event loop thread to the application thread).  Unfortunately,
python's Condition implementation includes a loop with a minimum sleep
of 1ms if the Condition isn't already set when you start the wait()
call.  This is why, with a single application thread, you will typically
see a minimum of 1ms latency.

Another source of similar latencies for the python driver is the
Asyncore event loop, which is used when libev isn't available.  I would
make sure that you can use the LibevConnection class with the driver to
avoid this.

On Fri, Mar 27, 2015 at 6:24 AM, Artur Siekielski <a...@vhex.net
<mailto:a...@vhex.net>> wrote:

    I'm running Cassandra locally and I see that the execution time for
    the simplest queries is 1-2 milliseconds. By a simple query I mean
    either INSERT or SELECT from a small table with short keys.

    While this number is not high, it's about 10-20 times slower than
    Postgresql (even if INSERTs are wrapped in transactions). I know
    that the nature of Cassandra compared to Postgresql is different,
    but for some scenarios this difference can matter.

    The question is: is it normal for Cassandra to have a minimum
    latency of 1 millisecond?

    I'm using Cassandra 2.1.2, python-driver.

Reply via email to