Re: Soliciting thoughts on possible read optimization

Edward Capriolo Wed, 11 Aug 2010 08:46:23 -0700

On Wed, Aug 11, 2010 at 11:37 AM, Ryan King <r...@twitter.com> wrote:
> On Tue, Aug 10, 2010 at 8:43 PM, Arya Asemanfar <aryaaseman...@gmail.com> 
> wrote:
>> I mentioned this today to a couple folks at Cassandra Summit, and thought
>> I'd solicit some more thoughts here.
>> Currently, the read stage includes checking row cache. So if your concurrent
>> reads is N and you have N reads reading from disk, the next read will block
>> until a disk read finishes, even if it's in row cache. Would it make sense
>> to isolate disk reads from cache reads? To either make the read stage be
>> only used on misses, or to make 2 read stages CacheRead and DiskRead? Of
>> course, we'd have to go to DiskRead for mmap since we wouldn't know until we
>> asked the OS.
>> My thought is that stages should be based on resources rather than
>> semantics, but that may be wrong. Logically, I don't think it would make
>> sense to have the read stage bounded in a hypothetical system where there is
>> no IO; it's most likely because of the disk and subsequent IO contention
>> that that cap was introduced.
>> As a possible bonus with this change, you can make other optimizations like
>> batching row reads from disk where the keys were in key cache (does this
>> even make sense? I'm not too sure how that would work).
>
> I think this is a reasonable analysis. The idea of stages in the
> research SEDA is to put bounds around scarce resources. I wouldn't
> call reading from the row cache a scarce resource. I'd expect this
> change to have significant performance improvements for workloads that
> are heavily rowcache-able.
>
> -ryan
>


I think that makes sense. If I understand correctly the only type of
reads that will be served purely from Row Cache would be CL.ONE, so
reads of QUORUM or ALL would skip this stage.

Re: Soliciting thoughts on possible read optimization

Reply via email to