I'd like to move forward on this idea, and want to get more advises about
it.

Currently, I already have a hacked version of the batched read requests,
which implements a MultiReadCommand, and the coordinator nodes will group
the read commands to the same first endpoint together and construct a
new MultiReadCommand. On the data node, when it receives
the MultiReadCommand, there will be a new MultiReadVerbHandler who executes
the read commands sequentially and return the results.

There are still a lot of work to be done, such as:
1. Have the speculative retry support for MultiReadCommand.
2. Have a thread pool to execute the read commands instead of executing
them sequentially.
3. Group by partition instead of by first endpoint.

Before I move forward, I'd like to collect some advises from you guys. Does
it make sense to you? Any thoughts?

Thanks
Dikang.

On Wed, Oct 19, 2016 at 3:29 PM, Dikang Gu <dikan...@gmail.com> wrote:

> I create a new jira to track this: CASSANDRA-12814, which is linked to
> CASSANDRA-10414.
>
> @Nate, agree. And it would be great if we can batch the reads from
> different partitions but still on the same physical host as well, which
> will be valuable for our existing and potential use cases.
>
> Thanks
> Dikang.
>
> On Wed, Oct 19, 2016 at 3:01 PM, Nate McCall <zznat...@gmail.com> wrote:
>
>> I see a few slightly different things here (equally valuable) in
>> conjunction with CASSANDRA-10414:
>> - Wanting a small number of specific, non-sequential rows out of the
>> same partition (this is common, IME) and grouping those
>> - Extending batch semantics to reads with the same understanding with
>> mutate that if you put different partitions in the same batch it will
>> be slow
>>
>> (I think Eric's IN(..) sorta fits with either of those).
>>
>> Interesting!
>>
>> On Thu, Oct 20, 2016 at 4:26 AM, Tyler Hobbs <ty...@datastax.com> wrote:
>> > There's a similar ticket focusing on range reads and secondary index
>> > queries, but the work for these could be done together:
>> > https://issues.apache.org/jira/browse/CASSANDRA-10414
>> >
>> > On Tue, Oct 18, 2016 at 5:59 PM, Dikang Gu <dikan...@gmail.com> wrote:
>> >
>> >> Hi there,
>> >>
>> >> We have couple use cases that are doing fanout read for their data,
>> means
>> >> one single read request from client contains multiple keys which live
>> on
>> >> different physical hosts. (I know it's not recommended way to access
>> C*).
>> >>
>> >> Right now, on the coordinator, it will issue separate read commands
>> even
>> >> though they will go to the same physical host, which I think is
>> causing a
>> >> lot of overheads.
>> >>
>> >> I'm wondering is it valuable to provide a new read command, that
>> >> coordinator can batch the reads to one datanode, and send to it in one
>> >> message, and datanode will return the results for all keys belong to
>> it?
>> >>
>> >> Any similar ideas before?
>> >>
>> >>
>> >> --
>> >> Dikang
>> >>
>> >
>> >
>> >
>> > --
>> > Tyler Hobbs
>> > DataStax <http://datastax.com/>
>>
>
>
>
> --
> Dikang
>
>


-- 
Dikang

Reply via email to