Re: [sqlite] SQLite for datalogging - best practices

Gerlando Falauto Wed, 31 Oct 2018 14:35:18 -0700

Hi Keith,

first of all let me tell you I cannot thank you enough for going through
all this trouble for me!


It did not occur to me it would help to run checkpoints manually.
There's a couple of things I don't understand though.

- After a checkpoint runs to completion (assuming no other process is
reading nor writing), what is the value of walpages (i.e. *pnLog)? Is it
the number of pages *originally* in the logfile (i.e. the same as
copypages), or is it the number of pages *remaining* in the logfile (i.e.
zero)? Also, does it refer to the number of *valid* pages or the number of
pages allocated by the file (i.e. fllesize divided by pagesize)?
- How often are you running checkpoints? Do you expect this variable to
have an impact on the end result?

Anyway, I'll definitely try the same approach on the current schema (which
is *way* different than your super-optimized version) just to see if it is
more or less consistent with the trend reported by smartmontools. If that's
the case, that makes for a much more convenient metric to estimate
performance and compare implementations.

So really, thanks a lot!
Gerlando


On Wed, Oct 31, 2018 at 4:46 AM Keith Medcalf <kmedc...@dessus.com> wrote:

>
> Based on the block erase rate required to maintain I/O churn using the
> following database schema and assumptions:
>
> - SSD size is 32 GB
> - erase size is 2 MB
> - 3000 erase cycles per block (49,152,000 erase operations total)
>
> -- data returned from the wal_checkpoint operations
> create table Tracer
> (
>   timestamp integer primary key,
>   status integer not null,
>   walpages integer not null,
>   copypages integer not null
> );
>
> -- Tags table
> create table Tags
> (
>   id integer primary key,
>   source text collate nocase unique
> );
>
> -- History table
> create table History
> (
>   id integer primary key,
>   sid integer not null references Tags,
>   timestamp integer not null,
>   dtype text collate nocase,
>   data,
>   unique (timestamp, sid),
>   unique (sid, timestamp)
> );
>
> -- operating parameters
> pragma cache_size=262133; -- the bigger the better so that page changes do
> not need to be spilled until commit
> pragma journal_mode=wal;
> pragma wal_autocheckpoint=262144; -- large so that checkpointing is only a
> manual operation
>
> with the following operational parameters:
>  - data arrives in "clusters" so that each second can be committed within
> a transaction
>  - vector data once per second from each of 6 3-axis accelerometers
> containing an average 4KB data payload each
>  - scalar arriving randomly (15 - 45 seconds, avg 30 seconds) from a
> random subset of 16 sources data payload is a double
>  - ID in the History table is the integer seconds since the unix epoch <<
> 16 + sid
> - "old" data deletion is performed at the start of every transaction
>
> The Tracer table has data written whenever a wal_checkpoint is done.
> walpages is the number of pages in the wal file at checkpoint and copypages
> is the number of pages that were copied from the wal file into the main
> database file.  Basically, every page in the wal file must be overwritten
> (eventually) and every page written to the db file is a page that must be
> erased from the db file.  If you add the two together, you get the number
> of pages that have been written (more or less) and have to be eventually
> erased.  There are 512 4K pages in a 2M erase block, so adding these up and
> dividing by 512 gives you a rough estimate of number of erases.  Given that
> we know how many erases we have available (based on the device size and an
> estimate of the number of erase operations per erase block) we can estimate
> how long the device will last until an erase fails and the device becomes
> useless.
>
> How long do you need the device to last?  Based on the data collected so
> far I can estimate that the device will last quite a long time (a decade or
> more).  Of course, it will take a while for the database to reach
> steady-state ... to see if as it gets bigger the pagechange set per
> checkpoint increases much.
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
> >-----Original Message-----
> >From: sqlite-users [mailto:sqlite-users-
> >boun...@mailinglists.sqlite.org] On Behalf Of Gerlando Falauto
> >Sent: Tuesday, 30 October, 2018 01:46
> >To: SQLite mailing list
> >Subject: Re: [sqlite] SQLite for datalogging - best practices
> >
> >On Tue, Oct 30, 2018 at 4:58 AM Keith Medcalf <kmedc...@dessus.com>
> >wrote:
> >
> >>
> >> If you don't mind me asking, what sort of data are you collecting?
> >> Are you the master (ie, scanning) or a slave (getting async data
> >pushed to
> >> you).
> >> Are you "compressing" the returned data (storing only changes
> >exceeding
> >> the deadband) or are you storing every value (or is the source
> >instrument
> >> doing compression)?
> >>
> >I presume you need to store the TimeStamp, Point, Value and
> >Confidence.
> >> What is the data rate (# Points and Frequency)
> >>
> >
> >The bulk of data consists of streams of AC signals being pushed from
> >a
> >handful of 3-axis accelerometers which are more or less synchronous.
> >Data rate is in the order of a few hundreds samples/sec for each
> >sensor.
> >A first software layer handles buffering and passes one-second
> >buffers to a
> >second software layer which then saves it to the database for later
> >analysis.
> >Database schema currently consists of a single table with roughly the
> >following columns: timestamp, sensorid (string), datatype (string)
> >and a
> >string containing the JSON encoding of those few hundred samples (as
> >a JSON
> >array).
> >So each row takes up about 3-4KBs (~200 samples * ~5 bytes/samples *
> >3 axes
> >+ overhead).
> >At a later stage one may want to pack together adjacent chunks into
> >even
> >longer strings (so to reduce the total number of rows) and/or store
> >data in
> >a more efficient manner (e.g. in binary or compressed form).
> >I don't particularly like this way of storing logical streams (i.e.
> >Time
> >Series) in chunks but I could find any better way.
> >
> >There's also some way-less-frequent readings (scalar quantities being
> >collected every 30 seconds or so) currently being stored in the same
> >table.
> >
> >Any suggestion on how to improve this?
> >_______________________________________________
> >sqlite-users mailing list
> >sqlite-users@mailinglists.sqlite.org
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLite for datalogging - best practices

Reply via email to