Another thing: Is the py_stress traffic definitely non-determinstic such that each client will generate a definitely unique series of requests? If all clients are deterministically requesting the same sequence of keys, it would otherwise be plausible that they end up in effective lock-step, if the bottleneck is serialized but the result of it goes out in parallel to all clients.
(This is probably not the case but I'm not sure off hand about whether there is anything in Cassandra's implementation that might be compatible with this hypothesis, in terms of how concurrent requests for the same key will be handled.) -- / Peter Schuller