You do not Need RAID0 for data. Let C* do striping over data disks. And maybe CL ANY/ONE might be sufficient for your writes.
> Am 08.06.2014 um 06:15 schrieb Kevin Burton <bur...@spinn3r.com>: > > we're using containers for other reasons, not just cassandra. > > Tightly constraining resources means we don't have to worry about cassandra , > the JVM , or Linux doing something silly and using too many resources and > taking down the whole box. > > >> On Sat, Jun 7, 2014 at 8:25 PM, Colin Clark <co...@clark.ws> wrote: >> You won't need containers - running one instance of Cassandra in that >> configuration will hum along quite nicely and will make use of the cores and >> memory. >> >> I'd forget the raid anyway and just mount the disks separately (jbod) >> >> -- >> Colin >> 320-221-9531 >> >> >>> On Jun 7, 2014, at 10:02 PM, Kevin Burton <bur...@spinn3r.com> wrote: >>> >>> Right now I'm just putting everything together as a proof of concept… so >>> just two cheap replicas for now. And it's at 1/10000th of the load. >>> >>> If we lose data it's ok :) >>> >>> I think our config will be 2-3x 400GB SSDs in RAID0 , 3 replicas, 16 cores, >>> probably 48-64GB of RAM each box. >>> >>> Just one datacenter for now… >>> >>> We're probably going to be migrating to using linux containers at some >>> point. This way we can have like 16GB , one 400GB SSD, 4 cores for each >>> image. And we can ditch the RAID which is nice. :) >>> >>> >>>> On Sat, Jun 7, 2014 at 7:51 PM, Colin <colpcl...@gmail.com> wrote: >>>> To have any redundancy in the system, start with at least 3 nodes and a >>>> replication factor of 3. >>>> >>>> Try to have at least 8 cores, 32 gig ram, and separate disks for log and >>>> data. >>>> >>>> Will you be replicating data across data centers? >>>> >>>> -- >>>> Colin >>>> 320-221-9531 >>>> >>>> >>>>> On Jun 7, 2014, at 9:40 PM, Kevin Burton <bur...@spinn3r.com> wrote: >>>>> >>>>> Oh.. To start with we're going to use from 2-10 nodes.. >>>>> >>>>> I think we're going to take the original strategy and just to use 100 >>>>> buckets .. 0-99… then the timestamp under that.. I think it should be >>>>> fine and won't require an ordered partitioner. :) >>>>> >>>>> Thanks! >>>>> >>>>> >>>>>> On Sat, Jun 7, 2014 at 7:38 PM, Colin Clark <co...@clark.ws> wrote: >>>>>> With 100 nodes, that ingestion rate is actually quite low and I don't >>>>>> think you'd need another column in the partition key. >>>>>> >>>>>> You seem to be set in your current direction. Let us know how it works >>>>>> out. >>>>>> >>>>>> -- >>>>>> Colin >>>>>> 320-221-9531 >>>>>> >>>>>> >>>>>>> On Jun 7, 2014, at 9:18 PM, Kevin Burton <bur...@spinn3r.com> wrote: >>>>>>> >>>>>>> What's 'source' ? You mean like the URL? >>>>>>> >>>>>>> If source too random it's going to yield too many buckets. >>>>>>> >>>>>>> Ingestion rates are fairly high but not insane. About 4M inserts per >>>>>>> hour.. from 5-10GB… >>>>>>> >>>>>>> >>>>>>>> On Sat, Jun 7, 2014 at 7:13 PM, Colin Clark <co...@clark.ws> wrote: >>>>>>>> Not if you add another column to the partition key; source for >>>>>>>> example. >>>>>>>> >>>>>>>> I would really try to stay away from the ordered partitioner if at all >>>>>>>> possible. >>>>>>>> >>>>>>>> What ingestion rates are you expecting, in size and speed. >>>>>>>> >>>>>>>> -- >>>>>>>> Colin >>>>>>>> 320-221-9531 >>>>>>>> >>>>>>>> >>>>>>>>> On Jun 7, 2014, at 9:05 PM, Kevin Burton <bur...@spinn3r.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for the feedback on this btw.. .it's helpful. My notes below. >>>>>>>>> >>>>>>>>>> On Sat, Jun 7, 2014 at 5:14 PM, Colin Clark <co...@clark.ws> wrote: >>>>>>>>>> No, you're not-the partition key will get distributed across the >>>>>>>>>> cluster if you're using random or murmur. >>>>>>>>> >>>>>>>>> Yes… I'm aware. But in practice this is how it will work… >>>>>>>>> >>>>>>>>> If we create bucket b0, that will get hashed to h0… >>>>>>>>> >>>>>>>>> So say I have 50 machines performing writes, they are all on the same >>>>>>>>> time thanks to ntpd, so they all compute b0 for the current bucket >>>>>>>>> based on the time. >>>>>>>>> >>>>>>>>> That gets hashed to h0… >>>>>>>>> >>>>>>>>> If h0 is hosted on node0 … then all writes go to node zero for that 1 >>>>>>>>> second interval. >>>>>>>>> >>>>>>>>> So all my writes are bottlenecking on one node. That node is >>>>>>>>> *changing* over time… but they're not being dispatched in parallel >>>>>>>>> over N nodes. At most writes will only ever reach 1 node a time. >>>>>>>>> >>>>>>>>> >>>>>>>>>> You could also ensure that by adding another column, like source to >>>>>>>>>> ensure distribution. (Add the seconds to the partition key, not the >>>>>>>>>> clustering columns) >>>>>>>>>> >>>>>>>>>> I can almost guarantee that if you put too much thought into working >>>>>>>>>> against what Cassandra offers out of the box, that it will bite you >>>>>>>>>> later. >>>>>>>>> >>>>>>>>> Sure.. I'm trying to avoid the 'bite you later' issues. More so >>>>>>>>> because I'm sure there are Cassandra gotchas to worry about. >>>>>>>>> Everything has them. Just trying to avoid the land mines :-P >>>>>>>>> >>>>>>>>>> In fact, the use case that you're describing may best be served by a >>>>>>>>>> queuing mechanism, and using Cassandra only for the underlying store. >>>>>>>>> >>>>>>>>> Yes… that's what I'm doing. We're using apollo to fan out the queue, >>>>>>>>> but the writes go back into cassandra and needs to be read out >>>>>>>>> sequentially. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I used this exact same approach in a use case that involved writing >>>>>>>>>> over a million events/second to a cluster with no problems. >>>>>>>>>> Initially, I thought ordered partitioner was the way to go too. And >>>>>>>>>> I used separate processes to aggregate, conflate, and handle >>>>>>>>>> distribution to clients. >>>>>>>>> >>>>>>>>> >>>>>>>>> Yes. I think using 100 buckets will work for now. Plus I don't have >>>>>>>>> to change the partitioner on our existing cluster and I'm lazy :) >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Just my two cents, but I also spend the majority of my days helping >>>>>>>>>> people utilize Cassandra correctly, and rescuing those that haven't. >>>>>>>>> >>>>>>>>> Definitely appreciate the feedback! Thanks! >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Founder/CEO Spinn3r.com >>>>>>>>> Location: San Francisco, CA >>>>>>>>> Skype: burtonator >>>>>>>>> blog: http://burtonator.wordpress.com >>>>>>>>> … or check out my Google+ profile >>>>>>>>> >>>>>>>>> War is peace. Freedom is slavery. Ignorance is strength. Corporations >>>>>>>>> are people. >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Founder/CEO Spinn3r.com >>>>>>> Location: San Francisco, CA >>>>>>> Skype: burtonator >>>>>>> blog: http://burtonator.wordpress.com >>>>>>> … or check out my Google+ profile >>>>>>> >>>>>>> War is peace. Freedom is slavery. Ignorance is strength. Corporations >>>>>>> are people. >>>>> >>>>> >>>>> >>>>> -- >>>>> Founder/CEO Spinn3r.com >>>>> Location: San Francisco, CA >>>>> Skype: burtonator >>>>> blog: http://burtonator.wordpress.com >>>>> … or check out my Google+ profile >>>>> >>>>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are >>>>> people. >>> >>> >>> >>> -- >>> Founder/CEO Spinn3r.com >>> Location: San Francisco, CA >>> Skype: burtonator >>> blog: http://burtonator.wordpress.com >>> … or check out my Google+ profile >>> >>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are >>> people. > > > > -- > Founder/CEO Spinn3r.com > Location: San Francisco, CA > Skype: burtonator > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > > War is peace. Freedom is slavery. Ignorance is strength. Corporations are > people.