Hi Simon,

On Mon, Oct 29, 2018 at 1:30 AM Simon Slavin <slav...@bigfraud.org> wrote:

> On 28 Oct 2018, at 11:47pm, Gerlando Falauto <gerlando.fala...@gmail.com>
> wrote:
>
> > On an SSD, I have no clue what's going on -- will writing a 2 MB block
> one page at a time be much worse than writing an entire block at once?
>
> You do indeed have no clue.  And so do I unless I use a utility to look at
> the low-level formatting of the SSD.  Which involves you doing things at
> the wrong level which, as I point out later, is not going to matter in the
> long run.
>
> Please also remember that an SSD can write a new block even if you change
> just one byte in the middle of an existing file.  That's what
> wear-levelling can mean.
>

I see. Though I guess it would make sense to match SQLITE's page size with
the flash page size, wouldn't it?

>> In terms of deleting rows, if you want the least wear on your SSD, do
> the deletes seldomly.  Definitely no more than once a day.
> >
> > Hmmm, I don't understand, could you please elaborate on that?
> > I first tried this kind of approach (DELETE once a day) for simplicity,
> but deleting big amounts of rows at once seemed to potentially take a
> relatively long time and generate a very big WAL file.
> > So my uneducated guess was then that I'd be better served by making room
> (DELETE) just before adding new data (INSERT) so to reuse just deleted
> pages.
>
> With an SSD, if you're minimising wear you want to do the minimum number
> of operations.  SQLite is good at doing the minimum number of write
> operations to take care of lots of changes at once.  Deleting a hundred
> rows individually may require five hundred write operations.  Deleting a
> hundred contiguous rows in one command may require just fifty.  Or even
> twenty.
>
> Of course, the requirement to reduce write operations may be in conflict
> with other requirements: lack of a long lock-out time, or minimised size of
> WAL file.
>

Exactly. And I guess it won't be a trivial task to figure out the best
trade-off.


> >> For instance, one crazy idea might be to put the WAL file on a ramdisk
> >>
> >> I understand why you're raise that but, as you suspect, it's not going
> to
> >> help.
> >
> > Reason for that being? The WAL file won't get sync()ed?
>
> The WAL file is not the only file which gets written in that pattern.  So
> does the database file itself.  The additional non-standard weirdness of
> putting the WAL file somewhere special isn't going to make a huge
> difference.
>

Granted. Though I guess minimizing the number of times it gets flushed
would make sense, wouldn't it? My understanding is that as long as I don't
push SYNCHRONOUS past 1 I should be OK.


> >> You should be logging by appending each entry to a text file.  Or maybe
> to
> >> a new textfile every day/week/month.  Then, to prune the log just delete
> >> one old textfile each time you start a new one.
> >
> > [...] I'd like to get a rough idea of how much worse it is to use
> > SQLite compared to this approach, in terms of SSD wearout.
> > If it's a 2x factor, I'm probably quite OK with that. If it's a 100x
> > factor, then I have a problem.
>
> Writing a log entry to a text file is incredibly efficient.  It's the
> number of octets in the log entry plus, for the sake of argument, one extra
> octet for a return character.  At most, you're writing to two blocks, plus
> assorted file meta-information.
>

Right, plus I might have some control and do some fine-tuning with syncs.



> Writing a field with the same number of octets to a SQLite database, in
> its own transaction, involves (not in this order) writing it to the journal
> file, writing it to the database, adding an entry to all indexes, writing
> row-length data, changing table-length and index-length data, changing the
> database header twice, and probably a bunch of things I've forgotten.  If
> your log database gets long you might write four times the amount of
> octets, to six different locations on disk.  For each log entry.
>
> I don't really know enough about SQLite's ACID writing pattern to give
> accurate numbers but that's a reasonable horseback guess.  Maybe someone
> who does will improve it.
>
> My take-away from thinking about this is again what I wrote before: write
> each log to a text file in a single atomic operation.  That way you know
> exactly what you did, and you perform one operation per log entry.  And
> deleting one log file per day/week/month is a single DELETE & TRIM to get
> rid of numerous log entries.
>

I agree, it makes a lot of sense to do that at least for comparison.

By all means, when/if it comes time to analyse the log entries, turn them
> into the so-useful database on whatever computer you use to do your
> analysis, meaning you aren't worried about performing lots of operations on
> your production SSD.
>

The point is, analysis will have to be performed on the production
(embedded) system anyway.
The whole idea of using a database was to have a fast and convenient way to
perform random access to data (just according to timestamp and data source,
nothing fancy).
The same thing could definitely be done by using text files with some
folder/file partitioning logic, except it requires an extra layer.
I wonder if there's a way to write a backed to peewee (or any other ORM for
that matter) which lets you work with CSV or jsonlines files, the way you
would do with Apache Drill for instance, but without the extra overhead.

Thanks again for your contributions!

Gerlando
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to