Re: Tools to manage repairs

Vincent Rischmann Fri, 28 Oct 2016 09:51:16 -0700

Well I only asked that because I wanted to make sure that we're not
doing it wrong, because that's actually how we query stuff,  we always
provide a cluster key or a range of cluster keys.


But yes, I understand that compactions may suffer and/or there may be
hidden bottlenecks because of big partitions, so it's definitely good to
know, and I'll definitely work on reducing partition sizes.

On Fri, Oct 28, 2016, at 06:32 PM, Edward Capriolo wrote:
>
>
> On Fri, Oct 28, 2016 at 11:21 AM, Vincent Rischmann
> <m...@vrischmann.me> wrote:
>> __
>> Doesn't paging help with this ? Also if we select a range via the
>> cluster key we're never really selecting the full partition. Or is
>> that wrong ?
>>
>>
>> On Fri, Oct 28, 2016, at 05:00 PM, Edward Capriolo wrote:
>>> Big partitions are an anti-pattern here is why:
>>>
>>> First Cassandra is not an analytic datastore. Sure it has some UDFs
>>> and aggregate UDFs, but the true purpose of the data store is to
>>> satisfy point reads. Operations have strict timeouts:
>>>
>>> # How long the coordinator should wait for read operations to
>>> # complete
>>> read_request_timeout_in_ms: 5000
>>>
>>> # How long the coordinator should wait for seq or index scans to
>>> # complete
>>> range_request_timeout_in_ms: 10000
>>>
>>> This means you need to be able to satisfy the operation in 5
>>> seconds. Which is not only the "think time" for 1 server, but if you
>>> are doing a quorum the operation has to complete and compare on 2 or
>>> more servers. Beyond these cutoffs are thread pools which fill up
>>> and start dropping requests once full.
>>>
>>> Something has to give, either functionality or physics. Particularly
>>> the physics of aggregating an ever-growing data set across N
>>> replicas in less than 5 seconds.  How many 2ms point reads will be
>>> blocked by 50 ms queries etc.
>>>
>>> I do not see the technical limitations of big partitions on disk is
>>> the only hurdle to climb here.
>>>
>>>
>>> On Fri, Oct 28, 2016 at 10:39 AM, Alexander Dejanovski
>>> <a...@thelastpickle.com> wrote:
>>>> Hi Eric,
>>>>
>>>> that would be https://issues.apache.org/jira/browse/CASSANDRA-9754
>>>> by Michael Kjellman and
>>>> https://issues.apache.org/jira/browse/CASSANDRA-11206 by Robert
>>>> Stupp.
>>>> If you haven't seen it yet, Robert's summit talk on big partitions
>>>> is totally worth it :
>>>> Video : https://www.youtube.com/watch?v=N3mGxgnUiRY
>>>> Slides :
>>>> http://www.slideshare.net/DataStax/myths-of-big-partitions-robert-stupp-datastax-cassandra-summit-2016
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> On Fri, Oct 28, 2016 at 4:09 PM Eric Evans
>>>> <john.eric.ev...@gmail.com> wrote:
>>>>> On Thu, Oct 27, 2016 at 4:13 PM, Alexander Dejanovski
>>>>> <a...@thelastpickle.com> wrote:
>>>>> > A few patches are pushing the limits of partition sizes so we
>>>>> > may soon be
>>>>> > more comfortable with big partitions.
>>>>>
>>>>> You don't happen to have Jira links to these handy, do you?
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>  Eric Evans john.eric.ev...@gmail.com
>>>>>
>>>>
>>>>
>>>> --
>>>> -----------------
>>>> Alexander Dejanovski
>>>> France
>>>> @alexanderdeja
>>>>
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com[1]
>>>>
>>>>
>>
>
> "Doesn't paging help with this ? Also if we select a range via the
> cluster key we're never really selecting the full partition. Or is
> that wrong ?"
>
> What I am suggestion is that the data store has had this practical
> limitation on size of partition since inception. As a result the
> common use case is not to use it in such a way. For example, the
> compaction manager may not be optimized for this cases, queries
> running across large partitions may cause more contention or lots of
> young gen garbage , queries running across large partitions may occupy
> the slots of the read stage etc.
>
>
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201602.mbox/%3CCAJjpQyTS2eaCcRBVa=zmm-hcbx5nf4ovc1enw+sffgwvngo...@mail.gmail.com%3E
>
> I think there is possibly some more "little details" to discover. Not
> in a bad thing. I just do not think it you can hand-waive like a
> specific thing someone is working on now or paging solves it. If it
> was that easy it would be solved by now :)
>


Links:

  1. http://www.thelastpickle.com/

Re: Tools to manage repairs

Reply via email to