Re: [sqlite] SQLite for datalogging - best practices

Keith Medcalf Tue, 30 Oct 2018 20:47:00 -0700

Based on the block erase rate required to maintain I/O churn using the 
following database schema and assumptions:


- SSD size is 32 GB
- erase size is 2 MB
- 3000 erase cycles per block (49,152,000 erase operations total)

-- data returned from the wal_checkpoint operations
create table Tracer
(
  timestamp integer primary key,
  status integer not null,
  walpages integer not null,
  copypages integer not null
); 

-- Tags table
create table Tags
(
  id integer primary key,
  source text collate nocase unique
);

-- History table
create table History
(
  id integer primary key,
  sid integer not null references Tags,
  timestamp integer not null,
  dtype text collate nocase,
  data,
  unique (timestamp, sid),
  unique (sid, timestamp)
);

-- operating parameters
pragma cache_size=262133; -- the bigger the better so that page changes do not 
need to be spilled until commit
pragma journal_mode=wal;
pragma wal_autocheckpoint=262144; -- large so that checkpointing is only a 
manual operation

with the following operational parameters:
 - data arrives in "clusters" so that each second can be committed within a 
transaction
 - vector data once per second from each of 6 3-axis accelerometers containing 
an average 4KB data payload each
 - scalar arriving randomly (15 - 45 seconds, avg 30 seconds) from a random 
subset of 16 sources data payload is a double
 - ID in the History table is the integer seconds since the unix epoch << 16 + 
sid
- "old" data deletion is performed at the start of every transaction

The Tracer table has data written whenever a wal_checkpoint is done.  walpages 
is the number of pages in the wal file at checkpoint and copypages is the 
number of pages that were copied from the wal file into the main database file. 
 Basically, every page in the wal file must be overwritten (eventually) and 
every page written to the db file is a page that must be erased from the db 
file.  If you add the two together, you get the number of pages that have been 
written (more or less) and have to be eventually erased.  There are 512 4K 
pages in a 2M erase block, so adding these up and dividing by 512 gives you a 
rough estimate of number of erases.  Given that we know how many erases we have 
available (based on the device size and an estimate of the number of erase 
operations per erase block) we can estimate how long the device will last until 
an erase fails and the device becomes useless.

How long do you need the device to last?  Based on the data collected so far I 
can estimate that the device will last quite a long time (a decade or more).  
Of course, it will take a while for the database to reach steady-state ... to 
see if as it gets bigger the pagechange set per checkpoint increases much.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[email protected]] On Behalf Of Gerlando Falauto
>Sent: Tuesday, 30 October, 2018 01:46
>To: SQLite mailing list
>Subject: Re: [sqlite] SQLite for datalogging - best practices
>
>On Tue, Oct 30, 2018 at 4:58 AM Keith Medcalf <[email protected]>
>wrote:
>
>>
>> If you don't mind me asking, what sort of data are you collecting?
>> Are you the master (ie, scanning) or a slave (getting async data
>pushed to
>> you).
>> Are you "compressing" the returned data (storing only changes
>exceeding
>> the deadband) or are you storing every value (or is the source
>instrument
>> doing compression)?
>>
>I presume you need to store the TimeStamp, Point, Value and
>Confidence.
>> What is the data rate (# Points and Frequency)
>>
>
>The bulk of data consists of streams of AC signals being pushed from
>a
>handful of 3-axis accelerometers which are more or less synchronous.
>Data rate is in the order of a few hundreds samples/sec for each
>sensor.
>A first software layer handles buffering and passes one-second
>buffers to a
>second software layer which then saves it to the database for later
>analysis.
>Database schema currently consists of a single table with roughly the
>following columns: timestamp, sensorid (string), datatype (string)
>and a
>string containing the JSON encoding of those few hundred samples (as
>a JSON
>array).
>So each row takes up about 3-4KBs (~200 samples * ~5 bytes/samples *
>3 axes
>+ overhead).
>At a later stage one may want to pack together adjacent chunks into
>even
>longer strings (so to reduce the total number of rows) and/or store
>data in
>a more efficient manner (e.g. in binary or compressed form).
>I don't particularly like this way of storing logical streams (i.e.
>Time
>Series) in chunks but I could find any better way.
>
>There's also some way-less-frequent readings (scalar quantities being
>collected every 30 seconds or so) currently being stored in the same
>table.
>
>Any suggestion on how to improve this?
>_______________________________________________
>sqlite-users mailing list
>[email protected]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLite for datalogging - best practices

Reply via email to