[sqlite] Best way to temporarily store data before processing

Jim Callahan Tue, 14 Apr 2015 12:37:03 -0400

?There are too many unknowns to give a definitive answer to your question.
Any solution you chose is going to create some stress somewhere (in your
"Prepared Tables" database, RAM, the file system and so forth.

Just from my own limited experience. It is very easy while hacking to
CREATE TABLE and DELETE FROM in your main database; the price one pays is
that running .VACUUM to clean up the mess becomes time consuming.

RAM would be fastest, but also the riskiest; any memory leak at all in your
programs handling "gigabytes" of data and you run the risk of having to
reboot the system. RAM files would also be at risk from power
interruptions. If you were sure your data would "only" be 100 megabytes
then you would have a substantial safety margin in RAM.

Using RAM from the outset might fall in the category of "premature
optimization". Usually one wants something that is robust and has the
option of leaving an audit trail -- if that solution is too slow then look
at putting parts of it in RAM. One typically needs an option of audit
trails because there is always screwy data out there -- that sooner or
later shows up and break one's system.

A separate database using ATTACH makes sense, but the question would be
whether to have a template database or to create the files. This is
somewhat a matter of personal preference. I found it was not that hard to
.DUMP the template databases and create an "all code" solution. From my
perspective, I found an "all code" solution to be easier to maintain
because I did not have to rely on what may or may not be in (a version of)
a  template database (and if necessary, I could change the template
databases in the same project code I was working on), but your
circumstances and preferences may be different. This also depends on the
expressiveness of your scripting language. I was using Windows batch
scripts which did not loop very well; so I had to use some ingenuity to do
repetitive batch operations. Python is a more expressive language.

My recollection is that SQLite has a "temp" or "tmp" namespace available
for intermediate tables -- it was on my todo list, but I never got around
to exploring that option.

Jim Callahan
Orlando, FL

On Tue, Apr 14, 2015 at 5:40 AM, Jonathan Moules <J.Moules at hrwallingford.com
> wrote:

> Hi List,
> I'm wondering if anyone can offer me a "best practice" way of doing this.
>
> I'm doing some log analysis using Python/SQLite. Python parses a log file
> and splits the raw data from each line in the log into one of about 40
> tables in an SQLite database (I'll call them Raw Tables).
>
> Once a log file has been processed like this, I run some SQL which takes
> the data from the Raw Tables and aggregates/processes it into about 10
> different "Prepared Tables" which are read with a bunch of Views. The
> aggregation/processing doesn't take long, and the SQL for it is simple.
>
> I'd like to update the Prepared Tables after each log file is read because
> there are thousands of files and I don't want to have to rely on having GB
> of disk space sitting around for temporary Raw Tables.
>
> Once the Prepared Tables have been created, there's no real need to keep
> the data in the Raw Tables.
>
> The Prepared Tables don't have to be in the same database as the Raw
> Tables. I'm happy to use ATTACH.
>
> So my question:
> What's the best way to do this with the minimum overhead?
>
> Options that have come to mind (probably missed a lot):
>             - Some sort of empty template database for the Raw Tables
> which is copied/cloned/overwritten for each file processed.
>             - And/Or use "DELETE FROM Raw_Tables" to truncate it after
> each file (there are no indexes).
>             - And/Or place it into :memory:.
>             - And/Or just CREATE the Raw Tables for each file?
>             - And/Or do it within the Prepared Tables database and use
> "DELETE FROM Raw_Tables". (That file you wouldn't want in :memory: of
> course).
>
>
> Thoughts welcome, thanks for your time,
> Jonathan
>
> ________________________________
>
> HR Wallingford and its subsidiaries uses faxes and emails for confidential
> and legally privileged business communications. They do not of themselves
> create legal commitments. Disclosure to parties other than addressees
> requires our specific consent. We are not liable for unauthorised
> disclosures nor reliance upon them.
> If you have received this message in error please advise us immediately
> and destroy all copies of it.
>
> HR Wallingford Limited
> Howbery Park, Wallingford, Oxfordshire, OX10 8BA, United Kingdom
> Registered in England No. 02562099
>
> ________________________________
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] Best way to temporarily store data before processing

Reply via email to