>From that perspective, you could also use a frozen collection which takes away the ability to append, but for which overwrites shouldn't generate a tombstone.
On Wed, Jun 1, 2016, 5:54 PM kurt Greaves <k...@instaclustr.com> wrote: > Is there anything stopping you from using JSON instead of a collection? > > On 27 May 2016 at 15:20, Eric Stevens <migh...@gmail.com> wrote: > >> If you aren't removing elements from the map, you should instead be able >> to use an UPDATE statement and append the map. It will have the same effect >> as overwriting it, because all the new keys will take precedence over the >> existing keys. But it'll happen without generating a tombstone first. >> >> If you do have to remove elements from the collection during this >> process, you are either facing tombstones or having to surgically figure >> out which elements ought to be removed (which also involves tombstones, >> though at least not range tombstones, so a bit cheaper). >> >> On Fri, May 27, 2016, 5:39 AM Matthias Niehoff < >> matthias.nieh...@codecentric.de> wrote: >> >>> We are processing events in Spark and store the resulting entries >>> (containing a map) in Cassandra. The results can be new (no entry for this >>> key in Cassandra) or an Update (there is already an entry with this key in >>> Cassandra). We use the spark-cassandra-connector to store the data in >>> Cassandra. >>> >>> The connector will always do an insert of the data and will rely on the >>> upsert capabilities of cassandra. So every time an event is updated the >>> complete map is replaced with all the problems of tombstones. >>> Seems like we have to implement our own persist logic in which we check >>> if an element already exists and if yes update the map manually. that would >>> require a read before write which would be nasty. Another option would be >>> not to use a collection but (clustering) columns. Do you have another idea >>> of doing this? >>> >>> (the conclusion of this whole thing for me would be: use upsert, but do >>> specific updates on collections as an upsert might replace the whole >>> collection and generate thumbstones) >>> >>> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs <ty...@datastax.com>: >>> >>>> If you replace an entire collection, whether it's a map, set, or list, >>>> a range tombstone will be inserted followed by the new collection. If you >>>> only update a single element, no tombstones are generated. >>>> >>>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff < >>>> matthias.nieh...@codecentric.de> wrote: >>>> >>>>> Hi, >>>>> >>>>> we have a table with a Map Field. We do not delete anything in this >>>>> table, but to updates on the values including the Map Field (most of the >>>>> time a new value for an existing key, Rarely adding new keys). We now >>>>> encounter a huge amount of thumbstones for this Table. >>>>> >>>>> We used sstable2json to take a look into the sstables: >>>>> >>>>> >>>>> {"key": "Betty_StoreCatalogLines:7", >>>>> >>>>> "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001], >>>>> >>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 >>>>> 08:40Z",1463820040628001], >>>>> >>>>> >>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069], >>>>> >>>>> >>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708], >>>>> >>>>> >>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700], >>>>> >>>>> >>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430], >>>>> >>>>> >>>>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595], >>>>> >>>>> . . . >>>>> >>>>> >>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040], >>>>> >>>>> >>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","00000154d265c6b0",1463820040628001], >>>>> >>>>> >>>>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article >>>>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article >>>>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country >>>>> Code\":\"276\"}}",1463820040628001] >>>>> >>>>> >>>>> >>>>> Looking at the SStables it seem like every update of a value in a Map >>>>> breaks down to a delete and insert in the corresponding SSTable (see all >>>>> the thumbstone flags „t“ in the extract of sstable2json above). >>>>> >>>>> We are using Cassandra 2.2.5. >>>>> >>>>> Can you confirm this behavior? >>>>> >>>>> Thanks! >>>>> -- >>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting >>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland >>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 >>>>> (0) 172.1702676 >>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | >>>>> www.more4fi.de >>>>> >>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal >>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns >>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen >>>>> Schütz >>>>> >>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält >>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht >>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, >>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und >>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder >>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser >>>>> E-Mail ist nicht gestattet >>>>> >>>> >>>> >>>> >>>> -- >>>> Tyler Hobbs >>>> DataStax <http://datastax.com/> >>>> >>> >>> >>> >>> -- >>> Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting >>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland >>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) >>> 172.1702676 >>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | >>> www.more4fi.de >>> >>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal >>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns >>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen >>> Schütz >>> >>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält >>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht >>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, >>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und >>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder >>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser >>> E-Mail ist nicht gestattet >>> >> > > > -- > Kurt Greaves > k...@instaclustr.com > www.instaclustr.com >