This doesn't really belong to this topic but I also experienced what Ben says. I was migrating (and still am) tons of data from MySQL to CS. I measured several approached (async parallel, prepared stmt, sync with unlogged batches) and it turned out that batches where really fast and produced less problems with cluster overloading with MVs.
2017-02-09 11:28 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>: > That’s a very good point from Sylvain that I forgot/missed. That said, > we’ve seen plenty of scenarios where overall system throughput is improved > through unlogged batches. One of my colleagues did quite a bit of > benchmarking on this topic for his talk at last year’s C* summit: > http://www.slideshare.net/DataStax/microbatching- > highperformance-writes-adam-zegelin-instaclustr-cassandra-summit-2016 > > On Thu, 9 Feb 2017 at 20:52 Benjamin Roth <benjamin.r...@jaumo.com> wrote: > >> Ok got it. >> >> But it's interesting that this is supported: >> DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3)); >> >> This is technically mostly the same (Token awareness, >> coordination/routing, read performance, ...), right? >> >> 2017-02-09 10:43 GMT+01:00 Sylvain Lebresne <sylv...@datastax.com>: >> >> This is a statement on multiple partitions and there is really no >> optimization the code internally does on that. In fact, I strongly advise >> you to not use a batch but rather simply do a for loop client side and send >> statement individually. That way, your driver will be able to use proper >> token-awareness for each request (while if you send a batch, one >> coordinator will be picked up and will have to forward most statement, >> doing more network hops at the end of the day). The only case where using a >> batch is indeed legit is if you care about all the statement being atomic, >> but in that case it's a logged batch you want. >> >> That's btw more or less why we never bothered implementing that: it's >> totally doable technically, but it's not really such a good idea >> performance wise in practice most of the time, and you can easily work it >> around with a batch if you need atomicity. >> >> Which is not saying it will never be and shouldn't be supported btw, >> there is something to be said for the consistency of the CQL language in >> general. But it's why no-one took time to do it so far. >> >> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth <benjamin.r...@jaumo.com> >> wrote: >> >> Yes, thats the workaround - I'll try that. >> >> Would you agree it would be better for internal optimizations to process >> this within a single statement? >> >> 2017-02-09 10:32 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>: >> >> Yep, that makes it clear. I think an unlogged batch of prepared >> statements with one statement per PK tuple would be roughly equivalent? And >> probably no more complex to generate in the client? >> >> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth <benjamin.r...@jaumo.com> >> wrote: >> >> Maybe that makes it clear: >> >> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1, >> 3), (2, 3), (3, 4)); >> >> If want to delete or select a bunch of records identified by their >> multi-partitionkey tuples. >> >> 2017-02-09 10:18 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>: >> >> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you >> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else? >> >> Cheers >> Ben >> >> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.r...@jaumo.com> >> wrote: >> >> Hi Guys, >> >> CQL says this is not allowed: >> >> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2)); >> >> 1. Is there a reason for it? There shouldn't be a performance penalty, it >> is a PK lookup, the same thing works with a single pk column >> 2. Is there a known workaround for it? >> >> It would be much of a help to have it for daily business, IMHO it's a >> waste of resources to run multiple queries just to fetch a bunch of records >> by a PK. >> >> Thanks in advance for any reply >> >> -- >> Benjamin Roth >> Prokurist >> >> Jaumo GmbH · www.jaumo.com >> Wehrstraße 46 · 73035 Göppingen · Germany >> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1 >> <+49%207161%203048801> >> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >> >> -- >> ———————— >> Ben Slater >> Chief Product Officer >> Instaclustr: Cassandra + Spark - Managed | Consulting | Support >> +61 437 929 798 <+61%20437%20929%20798> >> >> >> >> >> -- >> Benjamin Roth >> Prokurist >> >> Jaumo GmbH · www.jaumo.com >> Wehrstraße 46 · 73035 Göppingen · Germany >> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1 >> <+49%207161%203048801> >> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >> >> -- >> ———————— >> Ben Slater >> Chief Product Officer >> Instaclustr: Cassandra + Spark - Managed | Consulting | Support >> +61 437 929 798 <+61%20437%20929%20798> >> >> >> >> >> -- >> Benjamin Roth >> Prokurist >> >> Jaumo GmbH · www.jaumo.com >> Wehrstraße 46 · 73035 Göppingen · Germany >> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1 >> <+49%207161%203048801> >> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >> >> >> >> >> >> -- >> Benjamin Roth >> Prokurist >> >> Jaumo GmbH · www.jaumo.com >> Wehrstraße 46 · 73035 Göppingen · Germany >> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1 >> <+49%207161%203048801> >> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >> > -- > ———————— > Ben Slater > Chief Product Officer > Instaclustr: Cassandra + Spark - Managed | Consulting | Support > +61 437 929 798 <+61%20437%20929%20798> > -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer