First off, if this sort of "code review" style question is inappropriate for this list, please feel free to reply to me directly and tell me to stop, and I'll know to not do this again.
That said, I have a question on the use of SQLite. At the risk of falling into the XY problem, I'll give some details on what i'm doing, and then how I'm doing it. My basic question is if what I'm doing is valid, and if I'm doing anything needlessly wasteful. I have a system monitoring events (and I know I'm being vague on the exact nature of these events, I can't provide too many details, sorry. Please try to accept what I say as given about them). The events have IDs that are generated externally, they're 30-40 ascii characters long, appear random, and known to be unique by external means for a given event. For the purposes of this particular problem, the only thing I care about tracking is when I first saw an event, and the last time I saw it. For better or worse, this ecosystem already thinks about timestamps as the number of minutes since a specific epoch, and is used to treating all time values as an integer in that space, I'm doing the same here. So, I have a RESTful server written in Python, using APSW to create a simple SQLite database: CREATE TABLE IF NOT EXISTS event( event_id TEXT PRIMARY KEY, first_seen INTEGER, last_seen INTEGER ) WITHOUT ROWID; Every time a new event comes in (they might be very out of order), I do a: INSERT INTO event(event_id, first_seen, last_seen) VALUES(?, ?, ?) ON CONFLICT(event_id) DO UPDATE SET first_seen = MIN(excluded.first_seen, hashes.first_seen), last_seen = MAX(excluded.last_seen, hashes.last_seen); To create the record for the event if it's new, or possibly update an existing one with new values. To give a sense of scale, I have around 5 billion events stored right now for the past couple of years in a 250gb database, and I see around 20 million per day, some small percentage of those are new. The important thing I can do for users is pull up reports. The report is roughly a summary of how old events are (when they were first seen, and how long they've been seen for). Outliers are highlighted, as are events that haven't been seen at all. The user will provide around ten thousand event IDs, the majority of them, approaching 99%, will exist in my database. When the user requests a report, I create an in memory database: ATTACH ':memory:' AS mem_db; CREATE TABLE mem_db.valid(event_id TEXT PRIMARY KEY); And populate that table with the events the user is interested in. I'm doing this since I won't get the list of items in one list, it's built up over some minutes. If the system running dies in the middle of a request, it's OK to start over. Then I run: SELECT mem_db.valid.event_id, event.first_seen, event.last_seen FROM mem_db.valid LEFT JOIN event ON event.event_id = mem_db.valid.event_id; And gather up the results and pretty them up for the user. Does all of this seem valid? It works, so I'm OK with it, but I'm far from a SQLite expert, and I want to know if I'm going to be backing myself into a corner or otherwise torturing things that should be done differently. Or, if the answer is: "Don't use SQLite for that", that's fine too, I'll start looking at other options. Thanks for any feedback. _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users