Re: [sqlite] Sqlite Sharding HOWTO

R Smith Sun, 29 Jul 2018 16:58:56 -0700

On 2018/07/30 12:39 AM, Gerlando Falauto wrote:

The question that needs to be answered specifically is: How many data
input sources are there? as in how many Processes will attempt to write
to the database at the same time? Two processes can obviously NOT write
at the same time, so if a concurrent write happens, one process has to
wait a few milliseconds. This gets compounded as more and more write
sources are added or as write-frequency increases.

When you say "a few milliseconds", how exactly do you measure that?

What is the timeframe where the second process has to wait if the
first one is doing a batch of a thousand writes?

This is very hard to say, hence my other questions in this regard. Thetime needed depends firstly on how much data needs to be inserted, then,how many inserts are required (if batched), then how many Indexes do youhave on the table (each Index need to be modified for each insert), thendoes the inserts contain only data, or perhaps functions that need to becalculated (especially for UDFs) and then, are there any triggers thatneed to run upon Inserting, and most importantly of all, how quick isthe hardware that has to drink all this data? (The hardware IO is thetypical bottleneck).

A query doing a single insert of a few bytes with no Indexes, notriggers, no functions will be stupendously fast, whereas any increasein one or more of the above will slow things down. How much exactly issomething you need to test, any guesswork will not be useful. What I cansay is that if you don't have any Unique Indexes or triggers, the insertspeed /should/ not change much with size.

To get to the point: With the above in mind, do you still feel like you
need to go the sharding route? Could you perhaps quote figures such as
how many bytes would a typical data update be? How many updates per
second, from how many different Processes?  Do you have a maintenance
Window (as in, is there a time of day or on a weekend or such that you
can go a few minutes without logging so one can clean up the old logs,
perhaps Vacuum and re-Analyze?

In the current use case thre's a single process. The way I see it, in
the near future it would probably increase to 3-4 processes,

Why? Are you adding more other logging sources? Or is this just afeeling that more processes would handle better in some way?

each doing 10-100 writes per second or so. Each write would be around
1KB-20KB (one single text field, I guess).

100 x 20KB inserts = 2MB Inserted x 4 processes = 8MB per secondinserted issuing around 400 file-locks... Let's assume only 1 Index(apart from the rowid) and no triggers, then this should be easilyhandled, even on slower hardware. This is significant enough though thatit should be tested before-hand with real-world operating parameters inplace.

I wonder if writing data in batches would be helpful though.

Most certainly. Less file-lock operations is better. Plan anything youdo to collect logs together and fire them into the DB at say 1 to 5second intervals in single transactions.How to decide the interval: How much data-time can you afford to lose ifthe system crashes? How fast is the data needed downstream by somethingreading the DB?

But yeah, I guess sharding makes much less sense right now. :-)


Agreed.

Cheers,
Ryan

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Sqlite Sharding HOWTO

Reply via email to