To have any redundancy in the system, start with at least 3 nodes and a replication factor of 3.
Try to have at least 8 cores, 32 gig ram, and separate disks for log and data. Will you be replicating data across data centers? -- Colin 320-221-9531 > On Jun 7, 2014, at 9:40 PM, Kevin Burton <bur...@spinn3r.com> wrote: > > Oh.. To start with we're going to use from 2-10 nodes.. > > I think we're going to take the original strategy and just to use 100 buckets > .. 0-99… then the timestamp under that.. I think it should be fine and won't > require an ordered partitioner. :) > > Thanks! > > >> On Sat, Jun 7, 2014 at 7:38 PM, Colin Clark <co...@clark.ws> wrote: >> With 100 nodes, that ingestion rate is actually quite low and I don't think >> you'd need another column in the partition key. >> >> You seem to be set in your current direction. Let us know how it works out. >> >> -- >> Colin >> 320-221-9531 >> >> >>> On Jun 7, 2014, at 9:18 PM, Kevin Burton <bur...@spinn3r.com> wrote: >>> >>> What's 'source' ? You mean like the URL? >>> >>> If source too random it's going to yield too many buckets. >>> >>> Ingestion rates are fairly high but not insane. About 4M inserts per >>> hour.. from 5-10GB… >>> >>> >>>> On Sat, Jun 7, 2014 at 7:13 PM, Colin Clark <co...@clark.ws> wrote: >>>> Not if you add another column to the partition key; source for example. >>>> >>>> I would really try to stay away from the ordered partitioner if at all >>>> possible. >>>> >>>> What ingestion rates are you expecting, in size and speed. >>>> >>>> -- >>>> Colin >>>> 320-221-9531 >>>> >>>> >>>>> On Jun 7, 2014, at 9:05 PM, Kevin Burton <bur...@spinn3r.com> wrote: >>>>> >>>>> >>>>> Thanks for the feedback on this btw.. .it's helpful. My notes below. >>>>> >>>>>> On Sat, Jun 7, 2014 at 5:14 PM, Colin Clark <co...@clark.ws> wrote: >>>>>> No, you're not-the partition key will get distributed across the cluster >>>>>> if you're using random or murmur. >>>>> >>>>> Yes… I'm aware. But in practice this is how it will work… >>>>> >>>>> If we create bucket b0, that will get hashed to h0… >>>>> >>>>> So say I have 50 machines performing writes, they are all on the same >>>>> time thanks to ntpd, so they all compute b0 for the current bucket based >>>>> on the time. >>>>> >>>>> That gets hashed to h0… >>>>> >>>>> If h0 is hosted on node0 … then all writes go to node zero for that 1 >>>>> second interval. >>>>> >>>>> So all my writes are bottlenecking on one node. That node is *changing* >>>>> over time… but they're not being dispatched in parallel over N nodes. At >>>>> most writes will only ever reach 1 node a time. >>>>> >>>>> >>>>>> You could also ensure that by adding another column, like source to >>>>>> ensure distribution. (Add the seconds to the partition key, not the >>>>>> clustering columns) >>>>>> >>>>>> I can almost guarantee that if you put too much thought into working >>>>>> against what Cassandra offers out of the box, that it will bite you >>>>>> later. >>>>> >>>>> Sure.. I'm trying to avoid the 'bite you later' issues. More so because >>>>> I'm sure there are Cassandra gotchas to worry about. Everything has >>>>> them. Just trying to avoid the land mines :-P >>>>> >>>>>> In fact, the use case that you're describing may best be served by a >>>>>> queuing mechanism, and using Cassandra only for the underlying store. >>>>> >>>>> Yes… that's what I'm doing. We're using apollo to fan out the queue, but >>>>> the writes go back into cassandra and needs to be read out sequentially. >>>>> >>>>>> >>>>>> I used this exact same approach in a use case that involved writing over >>>>>> a million events/second to a cluster with no problems. Initially, I >>>>>> thought ordered partitioner was the way to go too. And I used separate >>>>>> processes to aggregate, conflate, and handle distribution to clients. >>>>> >>>>> >>>>> Yes. I think using 100 buckets will work for now. Plus I don't have to >>>>> change the partitioner on our existing cluster and I'm lazy :) >>>>> >>>>>> >>>>>> Just my two cents, but I also spend the majority of my days helping >>>>>> people utilize Cassandra correctly, and rescuing those that haven't. >>>>> >>>>> Definitely appreciate the feedback! Thanks! >>>>> >>>>> -- >>>>> Founder/CEO Spinn3r.com >>>>> Location: San Francisco, CA >>>>> Skype: burtonator >>>>> blog: http://burtonator.wordpress.com >>>>> … or check out my Google+ profile >>>>> >>>>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are >>>>> people. >>> >>> >>> >>> -- >>> Founder/CEO Spinn3r.com >>> Location: San Francisco, CA >>> Skype: burtonator >>> blog: http://burtonator.wordpress.com >>> … or check out my Google+ profile >>> >>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are >>> people. > > > > -- > Founder/CEO Spinn3r.com > Location: San Francisco, CA > Skype: burtonator > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > > War is peace. Freedom is slavery. Ignorance is strength. Corporations are > people.