Hi, I have been tasked with picking and setting up a database with the following characteristics:
• Ultra-high availability - The real requirement is uptime - our whole platform becomes inaccessible without a “read” from the database. We need the read to authenticate users. Databases will never be spread across multiple networks. • Reasonably quick access speeds • Very low data storage - The data storage is very low - for 10 million users, we would have around 8GB of storage total. Having done a bit of research on Cassandra, I think the optimal approach for my use-case would be to replicate the data on ALL nodes possible, but require reads to only have a consistency level of one. So, in the case that a node goes down, we can still read/write to other nodes. It is not very important that a read be unanimously agreed upon, as long as Cassandra is eventually consistent, within around 1s, then there shouldn’t be an issue. When I go to set up the database though, I am required to set a replication factor to a number - 1,2,3,etc. So I can’t just say “ALL” and have it replicate to all nodes. Right now, I have a 2 node cluster with replication factor 3. Will this cause any issues, having a RF > #nodes? Or is there a way to just have it copy to all nodes? Is there any way that I can tune Cassandra to be more read-optimized? Finally, I have some misgivings about how well Cassandra fits my use-case. Please, if anyone has a suggestion as to why or why not it is a good fit, I would really appreciate your input! If this could be done with a simple SQL database and this is overkill, please let me know. Thanks for your input!