On Tue, Jul 10, 2018 at 8:29 AM, Code Wiget <codewige...@gmail.com> wrote:
> Hi, > > I have been tasked with picking and setting up a database with the > following characteristics: > > - Ultra-high availability - The real requirement is uptime - our whole > platform becomes inaccessible without a “read” from the database. We need > the read to authenticate users. Databases will never be spread across > multiple networks. > > Sooner or later life will happen and you're going to have some unavailability - may be worth taking the time to make it fail gracefully (cache auth responses, etc). > > - Reasonably quick access speeds > - Very low data storage - The data storage is very low - for 10 > million users, we would have around 8GB of storage total. > > Having done a bit of research on Cassandra, I think the optimal approach > for my use-case would be to replicate the data on *ALL* nodes possible, > but require reads to only have a consistency level of one. So, in the case > that a node goes down, we can still read/write to other nodes. It is not > very important that a read be unanimously agreed upon, as long as Cassandra > is eventually consistent, within around 1s, then there shouldn’t be an > issue. > Seems like a reasonably good fit, but there's no 1s guarantee - it'll USUALLY happen within milliseconds, but the edge cases don't have a strict guarantee at all (imagine two hosts in adjacent racks, the link between the two racks goes down, but both are otherwise functional - a query at ONE in either rack would be able to read and write data, but it would diverge between the two racks for some period of time). > > When I go to set up the database though, I am required to set a > replication factor to a number - 1,2,3,etc. So I can’t just say “ALL” and > have it replicate to all nodes. > That option doesn't exist. It's been proposed (and exists in Datastax Enterprise, which is a proprietary fork), but reportedly causes quite a bit of pain when misused, so people have successfully lobbied against it's inclusion in OSS Apache Cassandra. You could (assuming some basic java knowledge) extend NetworkTopologyStrategy to have it accomplish this, but I imagine you don't REALLY want this unless you're frequently auto-scaling nodes in/out of the cluster. You should probably just pick a high RF and you'll be OK with it. > Right now, I have a 2 node cluster with replication factor 3. Will this > cause any issues, having a RF > #nodes? Or is there a way to just have it > copy to *all* nodes? > It's obviously not the intended config, but I don't think it'll cause many problems. > Is there any way that I can tune Cassandra to be more read-optimized? > > Yes - definitely use leveled compaction instead of STCS (the default), and definitely take the time to tune the JVM args - read path generates a lot of short lived java objects, so a larger eden will help you (maybe up to 40-50% of max heap size). > Finally, I have some misgivings about how well Cassandra fits my use-case. > Please, if anyone has a suggestion as to why or why not it is a good fit, I > would really appreciate your input! If this could be done with a simple SQL > database and this is overkill, please let me know. > > Thanks for your input! > >