On Tue, Jul 10, 2018 at 8:29 AM, Code Wiget <codewige...@gmail.com> wrote:

> Hi,
>
> I have been tasked with picking and setting up a database with the
> following characteristics:
>
>    - Ultra-high availability - The real requirement is uptime - our whole
>    platform becomes inaccessible without a “read” from the database. We need
>    the read to authenticate users. Databases will never be spread across
>    multiple networks.
>
>
Sooner or later life will happen and you're going to have some
unavailability - may be worth taking the time to make it fail gracefully
(cache auth responses, etc).

>
>    - Reasonably quick access speeds
>    - Very low data storage - The data storage is very low - for 10
>    million users, we would have around 8GB of storage total.
>
> Having done a bit of research on Cassandra, I think the optimal approach
> for my use-case would be to replicate the data on *ALL* nodes possible,
> but require reads to only have a consistency level of one. So, in the case
> that a node goes down, we can still read/write to other nodes. It is not
> very important that a read be unanimously agreed upon, as long as Cassandra
> is eventually consistent, within around 1s, then there shouldn’t be an
> issue.
>

Seems like a reasonably good fit, but there's no 1s guarantee - it'll
USUALLY happen within milliseconds, but the edge cases don't have a strict
guarantee at all (imagine two hosts in adjacent racks, the link between the
two racks goes down, but both are otherwise functional - a query at ONE in
either rack would be able to read and write data, but it would diverge
between the two racks for some period of time).


>
> When I go to set up the database though, I am required to set a
> replication factor to a number - 1,2,3,etc. So I can’t just say “ALL” and
> have it replicate to all nodes.
>

That option doesn't exist. It's been proposed (and exists in Datastax
Enterprise, which is a proprietary fork), but reportedly causes quite a bit
of pain when misused, so people have successfully lobbied against it's
inclusion in OSS Apache Cassandra. You could (assuming some basic java
knowledge) extend NetworkTopologyStrategy to have it accomplish this, but I
imagine you don't REALLY want this unless you're frequently auto-scaling
nodes in/out of the cluster. You should probably just pick a high RF and
you'll be OK with it.


> Right now, I have a 2 node cluster with replication factor 3. Will this
> cause any issues, having a RF > #nodes? Or is there a way to just have it
> copy to *all* nodes?
>

It's obviously not the intended config, but I don't think it'll cause many
problems.


> Is there any way that I can tune Cassandra to be more read-optimized?
>
>
Yes - definitely use leveled compaction instead of STCS (the default), and
definitely take the time to tune the JVM args - read path generates a lot
of short lived java objects, so a larger eden will help you (maybe up to
40-50% of max heap size).


> Finally, I have some misgivings about how well Cassandra fits my use-case.
> Please, if anyone has a suggestion as to why or why not it is a good fit, I
> would really appreciate your input! If this could be done with a simple SQL
> database and this is overkill, please let me know.
>
> Thanks for your input!
>
>

Reply via email to