On February 11, 2011, Teg wrote: > Hello Thomas, > > I download off the net at between 400Mbps to 1 Gbps and have to pack > the data away into a Sqlite DB. I use an intermediate buffer on disk > and a background thread to write the data to the DB. Obviously, I > couldn't keep up if the data was continuous but, the lulls in between > the peak data rate times give me time to pack it away. Even if I get > an extended burst of data, it'll eventually make it in. If I was you, > I'd start by having something service the interface and buffer the > packet data and something else pulling the packet data from the buffer > and feeding it to Sqlite. Then you won't be dropping packets.
The program is split into two separate threads, one is dedicated to just capturing packets off the network, and storing them in a circular buffer in memory. The second thread is dedicated to parsing the packets, calculating stats, and occasionally flushing data to storage. What normally happens, is as hosts haven't been seen for a while, they get saved to the db, and are removed from the in memory hash table. But that doesn't happen for local hosts, the fake host used to capture traffic totals, or any external hosts that keep showing up. And temporarily I've added some code to auto save all in memory hosts to the database, as I'm seeing some instability in the interface with sqlite, causing the process to loose information once in a while. I've seen numbers on the internet stating that sqlite is super fast, should be able to handle tens of thousands of inserts or updates in a second or two, even in sync mode. So I'm a bit stumped as to why its performance is so low in my case. One time, I saw it take 5 seconds to flush almost 3k rows back to the db. No hosts are actually ever removed from the database, which has about 120k hosts in it by now, totaling up to a file size of around 7-8MB total. So its not a lot of data, and the queries are pretty simple. So I really am stumped. > > T > > Friday, February 11, 2011, 6:49:16 PM, you wrote: > > TF> Hi, I have a small problem with a program I've developed. > > TF> It basically captures packets from a network device on linux and stores > TF> transfer stats on each host seen. To store the stats long term I > decided to TF> use sqlite, and when the program saves the stats every few > minutes, it takes TF> about 4-5 seconds, and if I have the synchronous > pragma turned off, it takes TF> 1-2 seconds. > > TF> These are the relevant sql commands: > > TF> CREATE TABLE hosts (id INTEGER PRIMARY KEY,\ > TF> address INTEGER UNIQUE, \ > TF> first_seen INTEGER DEFAULT CURRENT_DATETIME, \ > TF> last_on INTEGER DEFAULT CURRENT_DATETIME, \ > TF> last_off INTEGER, \ > TF> rx_bytes INTEGER, \ > TF> tx_bytes INTEGER); > > TF> UPDATE hosts SET last_on = ?, last_off = ?, rx_bytes = ?, tx_bytes = ? > WHERE id TF> = ? > TF> INSERT INTO hosts ( address, first_seen, last_on, rx_bytes, tx_bytes ) > VALUES ( TF> ?, ?, ?, ?, ? ) > > TF> The stats saving code will interleave a bunch of UPDATEs and INSERTs, > inside a TF> single transaction, maybe that has something to do with it, > I'm not sure. > > TF> 1s to update them all isn't too bad, but it still can mean I'm > potentially TF> dropping packets, which I'd really rather not do. > > TF> Thanks. -- Thomas Fjellstrom tfjellst...@strangesoft.net _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users