Great explanation, thanks Jeff!

On 7 Mar 2018 17:49, "Javier Pareja" <pareja.jav...@gmail.com> wrote:

> Thank you for your time Jeff, very helpful.I  couldn't find anything out
> there about the subject and I suspected that this could be the case.
>
> Regarding the clustering key in this case:
> Back in the RDBMS world, you will always assign a sequential (or as
> sequential as possible) clustering key to a table to minimize fragmentation
> and increase the speed of the insertions. In the Cassandra world, does the
> same apply to the clustering key? For example, is it a good idea to assign
> a UUID to a clustering key, or would a timestamp be a better choice? I am
> thinking that partitions need to keep some sort of binary index for the
> clustering keys and for relatively large partitions it can be relatively
> expensive to maintain.
>
> F Javier Pareja
>
> On Wed, Mar 7, 2018 at 5:20 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>
>>
>>
>> On Wed, Mar 7, 2018 at 7:13 AM, Carlos Rolo <r...@pythian.com> wrote:
>>
>>> Hi Jeff,
>>>
>>> Could you expand: "Tables without clustering keys are often deceptively
>>> expensive to compact, as a lot of work (relative to the other cell
>>> boundaries) happens on partition boundaries." This is something I didn't
>>> know and highly interesting to know more about!
>>>
>>>
>>>
>> We do a lot "by partition". We build column indexes by partition. We
>> update the partition index on each partition. We invalidate key cache by
>> partition. They're not super expensive, but they take time, and tables with
>> tiny partitions can actually be slower to compact.
>>
>> There's no magic cutoff where it does/doesn't make sense, my comment is
>> mostly a warning that the edges of the "normal" use cases tend to be less
>> optimized than the common case. Having a table with a hundred billion
>> records, where the key is numeric and the value is a single byte (let's say
>> you're keeping track of whether or not a specific sensor has ever detected
>> some magic event, and you have 100B sensors, that table will be close to
>> the worst-case example of this behavior).
>>
>
>

-- 


--



Reply via email to