I'd like to move forward on this idea, and want to get more advises about it.
Currently, I already have a hacked version of the batched read requests, which implements a MultiReadCommand, and the coordinator nodes will group the read commands to the same first endpoint together and construct a new MultiReadCommand. On the data node, when it receives the MultiReadCommand, there will be a new MultiReadVerbHandler who executes the read commands sequentially and return the results. There are still a lot of work to be done, such as: 1. Have the speculative retry support for MultiReadCommand. 2. Have a thread pool to execute the read commands instead of executing them sequentially. 3. Group by partition instead of by first endpoint. Before I move forward, I'd like to collect some advises from you guys. Does it make sense to you? Any thoughts? Thanks Dikang. On Wed, Oct 19, 2016 at 3:29 PM, Dikang Gu <dikan...@gmail.com> wrote: > I create a new jira to track this: CASSANDRA-12814, which is linked to > CASSANDRA-10414. > > @Nate, agree. And it would be great if we can batch the reads from > different partitions but still on the same physical host as well, which > will be valuable for our existing and potential use cases. > > Thanks > Dikang. > > On Wed, Oct 19, 2016 at 3:01 PM, Nate McCall <zznat...@gmail.com> wrote: > >> I see a few slightly different things here (equally valuable) in >> conjunction with CASSANDRA-10414: >> - Wanting a small number of specific, non-sequential rows out of the >> same partition (this is common, IME) and grouping those >> - Extending batch semantics to reads with the same understanding with >> mutate that if you put different partitions in the same batch it will >> be slow >> >> (I think Eric's IN(..) sorta fits with either of those). >> >> Interesting! >> >> On Thu, Oct 20, 2016 at 4:26 AM, Tyler Hobbs <ty...@datastax.com> wrote: >> > There's a similar ticket focusing on range reads and secondary index >> > queries, but the work for these could be done together: >> > https://issues.apache.org/jira/browse/CASSANDRA-10414 >> > >> > On Tue, Oct 18, 2016 at 5:59 PM, Dikang Gu <dikan...@gmail.com> wrote: >> > >> >> Hi there, >> >> >> >> We have couple use cases that are doing fanout read for their data, >> means >> >> one single read request from client contains multiple keys which live >> on >> >> different physical hosts. (I know it's not recommended way to access >> C*). >> >> >> >> Right now, on the coordinator, it will issue separate read commands >> even >> >> though they will go to the same physical host, which I think is >> causing a >> >> lot of overheads. >> >> >> >> I'm wondering is it valuable to provide a new read command, that >> >> coordinator can batch the reads to one datanode, and send to it in one >> >> message, and datanode will return the results for all keys belong to >> it? >> >> >> >> Any similar ideas before? >> >> >> >> >> >> -- >> >> Dikang >> >> >> > >> > >> > >> > -- >> > Tyler Hobbs >> > DataStax <http://datastax.com/> >> > > > > -- > Dikang > > -- Dikang