On Tue, Oct 30, 2012 at 1:00 AM, Alek Paunov <a...@declera.com> wrote:

> On 29.10.2012 11:58, David Barrett wrote:
>
>> Because in practice, as someone actually doing it (as opposed to
>> theorizing
>> about it), it works great.  The MySQL portions of our service are always
>> in
>> a semi-constant state of emergency, while our sqlite portions just hum
>> along   And given that we're switching to SSDs, I expect they will hum
>> even
>> better.  What problems would you expect me to be seeing that I can happily
>> report I'm not, or what problems have I not yet encountered but will -- at
>> 100GB, or 1TB?
>>
>
> In your previous thread (2012-02), you have mentioned that you are about
> to open-source your replication method based on SQL statement distribution.
> Probably your work would be of interest for a huge number of sites managing
> data volumes around or bellow your current level, even if you switch to
> PostgreSQL at this point.
>
> IMHO, there might be a future for your replication model, because I think
> that SQLite, can more easily (relative to other proven DB technologies e.g.
> PostgreSQL) be turned to DB engine for more query languages than SQL
> (thanks to his clever VM design).
>
> Furthermore, AFAIK, PostgreSQL replicates at WAL distribution level, most
> NoSQL databases at keys distribution level, whereas your method seems more
> efficient as bandwidth.
>

Thanks Alek!  Yes, we're definitely planning on it, just trying to find the
right time.  We don't want to go through the work to open source it only to
be greeted with silence.  Might you be interested in using it in an actual
deployed environment, or just studying it?

As for the size this works up to, I should emphasize that Expensify uses
this for our main database -- and we have over a *million* users on it.
 That's not to say a million users is the biggest thing ever, but it's a
lot bigger than most websites (with far more complicated data structures),
and it works great.  Furthermore, we're in the process of upgrading all our
hardware and we feel that alone will get us at *least* an order of
magnitude improvement in capacity -- wiithout any algorithmic changes.  And
we've got plenty of ideas how to improve the basic technology and/or
restructure our database to get even more capacity, should we need it.

The upshot is I don't see a specific reason why it couldn't scale up to a
5M, 10M, or larger service.  And if it starts to break down after that?
 Well that's a problem we should all love to have.

Additionally, I think people get so excited about big data that they
overlook the importance of *available* data.  With this technology,
everything is replicated offsite in realtime, ensuring that service can
continue uninterrupted even when a whole datacenter goes underwater (as is
happening to many datacenters at this very moment in NYC) or falls off the
map (as happens to various AWS zones with surprising regularity).  Our
technology seamlessly fails over when any node (even the master) disappears
(or reappears), without dropping a single transaction -- the web layer
doesn't even know if it's talking to a master or slave, or it was a slave
that became master mid-transaction.

This total confidence in the data layer is what allows us to sleep soundly
even when servers crash: similar to how Google only fixes broken servers
every quarter, any business in this day and age that stresses out when a
server dies is doing it wrong.  Indeed, i'm writing this from a hotel in
Bangkok because every year we take the whole company overseas for a month
to work from the beach -- something that would be inconceivable to an
organization that puts all its eggs in one datacenter.

As for SQL versus binary replication, it has its pros and cons -- it's
generally (though not always) more bandwidth efficient, but at a higher CPU
cost: slaves need to redo all the work as the master.  But it's
fantastically simple, and I feel a simple design brings the most important
efficiency of all: easy to understand, easy to debug, easy to verify.

As for Postgre, MySQL, or any other database back end -- yes, it'd designed
to be a layer above the database.  We're in the midst of making it
optionally backed by a MySQL store, but yes, it should be easy to put
anything behind it.

Finally, that's interesting about using this to replicate non-SQL languages
-- yes, it's definitely language agnostic.  Anything that has the notion of
an atomic transaction with ROLLBACK and COMMIT should work fine with it.

Thanks for the interest!

-david
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to