>From that perspective, you could also use a frozen collection which takes
away the ability to append, but for which overwrites shouldn't generate a
tombstone.

On Wed, Jun 1, 2016, 5:54 PM kurt Greaves <k...@instaclustr.com> wrote:

> Is there anything stopping you from using JSON instead of a collection?
>
> On 27 May 2016 at 15:20, Eric Stevens <migh...@gmail.com> wrote:
>
>> If you aren't removing elements from the map, you should instead be able
>> to use an UPDATE statement and append the map. It will have the same effect
>> as overwriting it, because all the new keys will take precedence over the
>> existing keys. But it'll happen without generating a tombstone first.
>>
>> If you do have to remove elements from the collection during this
>> process, you are either facing tombstones or having to surgically figure
>> out which elements ought to be removed (which also involves tombstones,
>> though at least not range tombstones, so a bit cheaper).
>>
>> On Fri, May 27, 2016, 5:39 AM Matthias Niehoff <
>> matthias.nieh...@codecentric.de> wrote:
>>
>>> We are processing events in Spark and store the resulting entries
>>> (containing a map) in Cassandra. The results can be new (no entry for this
>>> key in Cassandra) or an Update (there is already an entry with this key in
>>> Cassandra). We use the spark-cassandra-connector to store the data in
>>> Cassandra.
>>>
>>> The connector will always do an insert of the data and will rely on the
>>> upsert capabilities of cassandra. So every time an event is updated the
>>> complete map is replaced with all the problems of tombstones.
>>> Seems like we have to implement our own persist logic in which we check
>>> if an element already exists and if yes update the map manually. that would
>>> require a read before write which would be nasty. Another option would be
>>> not to use a collection but (clustering) columns. Do you have another idea
>>> of doing this?
>>>
>>> (the conclusion of this whole thing for me would be: use upsert, but do
>>> specific updates on collections as an upsert might replace the whole
>>> collection and generate thumbstones)
>>>
>>> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs <ty...@datastax.com>:
>>>
>>>> If you replace an entire collection, whether it's a map, set, or list,
>>>> a range tombstone will be inserted followed by the new collection.  If you
>>>> only update a single element, no tombstones are generated.
>>>>
>>>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
>>>> matthias.nieh...@codecentric.de> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> we have a table with a Map Field. We do not delete anything in this
>>>>> table, but to updates on the values including the Map Field (most of the
>>>>> time a new value for an existing key, Rarely adding new keys). We now
>>>>> encounter a huge amount of thumbstones for this Table.
>>>>>
>>>>> We used sstable2json to take a look into the sstables:
>>>>>
>>>>>
>>>>> {"key": "Betty_StoreCatalogLines:7",
>>>>>
>>>>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>>>>
>>>>>            ["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
>>>>> 08:40Z",1463820040628001],
>>>>>
>>>>>            
>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>>>>
>>>>>            
>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>>>>
>>>>>            
>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>>>>
>>>>>            
>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>>>>>
>>>>>            
>>>>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>>>>>
>>>>> . . .
>>>>>
>>>>>   
>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],
>>>>>
>>>>>            
>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","00000154d265c6b0",1463820040628001],
>>>>>
>>>>>            
>>>>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
>>>>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
>>>>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
>>>>> Code\":\"276\"}}",1463820040628001]
>>>>>
>>>>>
>>>>>
>>>>> Looking at the SStables it seem like every update of a value in a Map
>>>>> breaks down to a delete and insert in the corresponding SSTable (see all
>>>>> the thumbstone flags „t“ in the extract of sstable2json above).
>>>>>
>>>>> We are using Cassandra 2.2.5.
>>>>>
>>>>> Can you confirm this behavior?
>>>>>
>>>>> Thanks!
>>>>> --
>>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49
>>>>> (0) 172.1702676
>>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>>>> www.more4fi.de
>>>>>
>>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>>>> Schütz
>>>>>
>>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>>>> E-Mail ist nicht gestattet
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Tyler Hobbs
>>>> DataStax <http://datastax.com/>
>>>>
>>>
>>>
>>>
>>> --
>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>>> 172.1702676
>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>> www.more4fi.de
>>>
>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>> Schütz
>>>
>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>> E-Mail ist nicht gestattet
>>>
>>
>
>
> --
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>

Reply via email to