Re: DELETE/SELECT with multi-column PK and IN

Benjamin Roth Thu, 09 Feb 2017 02:44:13 -0800

This doesn't really belong to this topic but I also experienced what Ben
says.
I was migrating (and still am) tons of data from MySQL to CS. I measured
several approached (async parallel, prepared stmt, sync with unlogged
batches) and it turned out that batches where really fast and produced less
problems with cluster overloading with MVs.


2017-02-09 11:28 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:

> That’s a very good point from Sylvain that I forgot/missed. That said,
> we’ve seen plenty of scenarios where overall system throughput is improved
> through unlogged batches. One of my colleagues did quite a bit of
> benchmarking on this topic for his talk at last year’s C* summit:
> http://www.slideshare.net/DataStax/microbatching-
> highperformance-writes-adam-zegelin-instaclustr-cassandra-summit-2016
>
> On Thu, 9 Feb 2017 at 20:52 Benjamin Roth <benjamin.r...@jaumo.com> wrote:
>
>> Ok got it.
>>
>> But it's interesting that this is supported:
>> DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));
>>
>> This is technically mostly the same (Token awareness,
>> coordination/routing, read performance, ...), right?
>>
>> 2017-02-09 10:43 GMT+01:00 Sylvain Lebresne <sylv...@datastax.com>:
>>
>> This is a statement on multiple partitions and there is really no
>> optimization the code internally does on that. In fact, I strongly advise
>> you to not use a batch but rather simply do a for loop client side and send
>> statement individually. That way, your driver will be able to use proper
>> token-awareness for each request (while if you send a batch, one
>> coordinator will be picked up and will have to forward most statement,
>> doing more network hops at the end of the day). The only case where using a
>> batch is indeed legit is if you care about all the statement being atomic,
>> but in that case it's a logged batch you want.
>>
>> That's btw more or less why we never bothered implementing that: it's
>> totally doable technically, but it's not really such a good idea
>> performance wise in practice most of the time, and you can easily work it
>> around with a batch if you need atomicity.
>>
>> Which is not saying it will never be and shouldn't be supported btw,
>> there is something to be said for the consistency of the CQL language in
>> general. But it's why no-one took time to do it so far.
>>
>> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>> Yes, thats the workaround - I'll try that.
>>
>> Would you agree it would be better for internal optimizations to process
>> this within a single statement?
>>
>> 2017-02-09 10:32 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>>
>> Yep, that makes it clear. I think an unlogged batch of prepared
>> statements with one statement per PK tuple would be roughly equivalent? And
>> probably no more complex to generate in the client?
>>
>> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>> Maybe that makes it clear:
>>
>> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
>> 3), (2, 3), (3, 4));
>>
>> If want to delete or select a bunch of records identified by their
>> multi-partitionkey tuples.
>>
>> 2017-02-09 10:18 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>>
>> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
>> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>>
>> Cheers
>> Ben
>>
>> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>> Hi Guys,
>>
>> CQL says this is not allowed:
>>
>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>
>> 1. Is there a reason for it? There shouldn't be a performance penalty, it
>> is a PK lookup, the same thing works with a single pk column
>> 2. Is there a known workaround for it?
>>
>> It would be much of a help to have it for daily business, IMHO it's a
>> waste of resources to run multiple queries just to fetch a bunch of records
>> by a PK.
>>
>> Thanks in advance for any reply
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>> --
>> ————————
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798 <+61%20437%20929%20798>
>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>> --
>> ————————
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798 <+61%20437%20929%20798>
>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
> --
> ————————
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: DELETE/SELECT with multi-column PK and IN

Reply via email to