Oh.. To start with we're going to use from 2-10 nodes..

I think we're going to take the original strategy and just to use 100
buckets .. 0-99… then the timestamp under that..  I think it should be fine
and won't require an ordered partitioner. :)

Thanks!


On Sat, Jun 7, 2014 at 7:38 PM, Colin Clark <co...@clark.ws> wrote:

> With 100 nodes, that ingestion rate is actually quite low and I don't
> think you'd need another column in the partition key.
>
> You seem to be set in your current direction.  Let us know how it works
> out.
>
> --
> Colin
> 320-221-9531
>
>
> On Jun 7, 2014, at 9:18 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
> What's 'source' ? You mean like the URL?
>
> If source too random it's going to yield too many buckets.
>
> Ingestion rates are fairly high but not insane.  About 4M inserts per
> hour.. from 5-10GB…
>
>
> On Sat, Jun 7, 2014 at 7:13 PM, Colin Clark <co...@clark.ws> wrote:
>
>> Not if you add another column to the partition key; source for example.
>>
>> I would really try to stay away from the ordered partitioner if at all
>> possible.
>>
>> What ingestion rates are you expecting, in size and speed.
>>
>> --
>> Colin
>> 320-221-9531
>>
>>
>> On Jun 7, 2014, at 9:05 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>>
>>
>> Thanks for the feedback on this btw.. .it's helpful.  My notes below.
>>
>> On Sat, Jun 7, 2014 at 5:14 PM, Colin Clark <co...@clark.ws> wrote:
>>
>>> No, you're not-the partition key will get distributed across the cluster
>>> if you're using random or murmur.
>>>
>>
>> Yes… I'm aware.  But in practice this is how it will work…
>>
>> If we create bucket b0, that will get hashed to h0…
>>
>> So say I have 50 machines performing writes, they are all on the same
>> time thanks to ntpd, so they all compute b0 for the current bucket based on
>> the time.
>>
>> That gets hashed to h0…
>>
>> If h0 is hosted on node0 … then all writes go to node zero for that 1
>> second interval.
>>
>> So all my writes are bottlenecking on one node.  That node is *changing*
>> over time… but they're not being dispatched in parallel over N nodes.  At
>> most writes will only ever reach 1 node a time.
>>
>>
>>
>>> You could also ensure that by adding another column, like source to
>>> ensure distribution. (Add the seconds to the partition key, not the
>>> clustering columns)
>>>
>>> I can almost guarantee that if you put too much thought into working
>>> against what Cassandra offers out of the box, that it will bite you later.
>>>
>>>
>> Sure.. I'm trying to avoid the 'bite you later' issues. More so because
>> I'm sure there are Cassandra gotchas to worry about.  Everything has them.
>>  Just trying to avoid the land mines :-P
>>
>>
>>> In fact, the use case that you're describing may best be served by a
>>> queuing mechanism, and using Cassandra only for the underlying store.
>>>
>>
>> Yes… that's what I'm doing.  We're using apollo to fan out the queue, but
>> the writes go back into cassandra and needs to be read out sequentially.
>>
>>
>>>
>>> I used this exact same approach in a use case that involved writing over
>>> a million events/second to a cluster with no problems.  Initially, I
>>> thought ordered partitioner was the way to go too.  And I used separate
>>> processes to aggregate, conflate, and handle distribution to clients.
>>>
>>
>>
>> Yes. I think using 100 buckets will work for now.  Plus I don't have to
>> change the partitioner on our existing cluster and I'm lazy :)
>>
>>
>>>
>>> Just my two cents, but I also spend the majority of my days helping
>>> people utilize Cassandra correctly, and rescuing those that haven't.
>>>
>>>
>> Definitely appreciate the feedback!  Thanks!
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> Skype: *burtonator*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
>> people.
>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> Skype: *burtonator*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
> people.
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Reply via email to