On February 11, 2011, Teg wrote:
> Hello Thomas,
> 
> I download off the net at between 400Mbps to 1 Gbps and have to pack
> the data away into a Sqlite DB. I use an intermediate buffer on disk
> and a background thread to write the data to the DB. Obviously, I
> couldn't keep up if the data was continuous but, the lulls in between
> the peak data rate times give me time to pack it away. Even if I get
> an extended burst of data, it'll eventually make it in. If I was you,
> I'd start by having something service the interface and buffer the
> packet data and something else pulling the packet data from the buffer
> and feeding it to Sqlite. Then you won't be dropping packets.

The program is split into two separate threads, one is dedicated to just 
capturing packets off the network, and storing them in a circular buffer in 
memory. The second thread is dedicated to parsing the packets, calculating 
stats, and occasionally flushing data to storage.

What normally happens, is as hosts haven't been seen for a while, they get 
saved to the db, and are removed from the in memory hash table. But that 
doesn't happen for local hosts, the fake host used to capture traffic totals, 
or 
any external hosts that keep showing up. And temporarily I've added some code 
to auto save all in memory hosts to the database, as I'm seeing some 
instability in the interface with sqlite, causing the process to loose 
information once in a while.

I've seen numbers on the internet stating that sqlite is super fast, should be 
able to handle tens of thousands of inserts or updates in a second or two, 
even in sync mode. So I'm a bit stumped as to why its performance is so low in 
my case. One time, I saw it take 5 seconds to flush almost 3k rows back to the 
db.

No hosts are actually ever removed from the database, which has about 120k 
hosts in it by now, totaling up to a file size of around 7-8MB total. So its 
not a lot of data, and the queries are pretty simple. So I really am stumped.

> 
> T
> 
> Friday, February 11, 2011, 6:49:16 PM, you wrote:
> 
> TF> Hi, I have a small problem with a program I've developed.
> 
> TF> It basically captures packets from a network device on linux and stores
> TF> transfer stats on each host seen. To store the stats long term I
> decided to TF> use sqlite, and when the program saves the stats every few
> minutes, it takes TF> about 4-5 seconds, and if I have the synchronous
> pragma turned off, it takes TF> 1-2 seconds.
> 
> TF> These are the relevant sql commands:
> 
> TF> CREATE TABLE hosts (id INTEGER PRIMARY KEY,\
> TF>   address INTEGER UNIQUE, \
> TF>   first_seen INTEGER DEFAULT CURRENT_DATETIME, \
> TF>   last_on INTEGER DEFAULT CURRENT_DATETIME, \
> TF>   last_off INTEGER, \
> TF>   rx_bytes INTEGER, \
> TF>   tx_bytes INTEGER);
> 
> TF> UPDATE hosts SET last_on = ?, last_off = ?, rx_bytes = ?, tx_bytes = ?
> WHERE id TF> = ?
> TF> INSERT INTO hosts ( address, first_seen, last_on, rx_bytes, tx_bytes )
> VALUES ( TF> ?, ?, ?, ?, ? )
> 
> TF> The stats saving code will interleave a bunch of UPDATEs and INSERTs,
> inside a TF> single transaction, maybe that has something to do with it,
> I'm not sure.
> 
> TF> 1s to update them all isn't too bad, but it still can mean I'm
> potentially TF> dropping packets, which I'd really rather not do.
> 
> TF> Thanks.


-- 
Thomas Fjellstrom
tfjellst...@strangesoft.net
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to