As it turns out, I can reproduce the failure using a single huge insert. The code that I'm including below compiles under bcc32 from Embarcadero's C++ Builder 2007 and cl from vs2005. Since it's more likely that people have MS compilers available, the compilation line having this source file and the sqlite3 3.7.0 amalgamation files in the same folder is:
cl -EHsc -Fefail.exe main.cpp sqlite3.c You can then invoke fail.exe with a single command line argument of 80000000 like this: Fail.exe 80000000 The source for the executable is listed below. If you're wondering about why the numbers being inserted are more complicated than need be, it's because I just wanted the table and indices to look as much as possible like the actual data that our application stores in sqlite because I had not realized that the failure could be reproduced with simply inserting. Beware that there is no handling of incorrect command line arguments. If you monitor this executable run with perfmon and look at its virtual bytes, you'll see them hit 2GB and then the next time the insert statement is stepped, it fails with an I/O disc error. I understand that this may not be the intended use case of sqlite, but I don't remember reading anything anywhere that forces you limit the size of insert operations, so a priori, there should not be any reason why simply inserting a lot of rows in a single transaction should make the db fail. If you break up the insert into chunks _and_close_the_connection_between_chunks_ then the error does not occur. This is even more bothersome because there is also no documentation saying that there would be a limit on the number of operations that can be performed on a single connection. Should I open a bug for this? Thanks a lot! Victor --------start main.cpp--------- #include <iostream> #include <sstream> #include <string> #include <exception> #include <cstdlib> #include <ctime> #include <map> #include "sqlite3.h" int main( int argc, char* argv[] ) { // boost::lexical_cast where art thou? int nRecords = 4000000; if ( argc > 1 ) { std::stringstream sstr; sstr << argv[1]; sstr >> nRecords; } sqlite3* connection = NULL; sqlite3_stmt* statement = NULL; // Open db std::cout << "Openning db." << std::endl; int rc = sqlite3_open( "./test.db", &connection ); if ( rc ) { std::string errorMessage( sqlite3_errmsg( connection ) ); std::runtime_error ex( errorMessage ); sqlite3_close( connection ); connection = NULL; std::cerr << errorMessage; throw ex; } else { int theTimeout = 5000; sqlite3_exec( connection, "PRAGMA page_size = 4096; PRAGMA foreign_keys = 1; PRAGMA cache_size = 20000; PRAGMA journal_mode=WAL;", NULL, NULL, NULL ); sqlite3_busy_timeout( connection, theTimeout ); } // Schema std::cout << "Creating schema." << std::endl; sqlite3_exec( connection, "BEGIN IMMEDIATE TRANSACTION;;", NULL, NULL, NULL ); sqlite3_exec( connection, "CREATE TABLE IF NOT EXISTS TEST_DATA( ID INTEGER PRIMARY KEY AUTOINCREMENT, GROUPID INTEGER, USERID INTEGER, CONTEXTID INTEGER, TSTAMP INTEGER, SOURCE STRING, TYPE STRING, SERVERID INTEGER, NSIG INTEGER, NNOI INTEGER, LON FLOAT, LAT FLOAT, CONF FLOAT );", NULL, NULL, NULL ); sqlite3_exec( connection, "CREATE INDEX IF NOT EXISTS IDX_TEST_DATA_ID ON TEST_DATA( ID );", NULL, NULL, NULL ); sqlite3_exec( connection, "CREATE INDEX IF NOT EXISTS IDX_TEST_DATA_USERID_GROUPID_TSTAMP ON TEST_DATA( USERID, GROUPID, TSTAMP);", NULL, NULL, NULL ); sqlite3_exec( connection, "CREATE INDEX IF NOT EXISTS IDX_TEST_DATA_GROUPID_TSTAMP ON TEST_DATA( GROUPID, TSTAMP );", NULL, NULL, NULL ); sqlite3_exec( connection, "CREATE INDEX IF NOT EXISTS IDX_TEST_DATA_CONTEXTID_TSTAMP ON TEST_DATA( CONTEXTID, TSTAMP );", NULL, NULL, NULL ); sqlite3_exec( connection, "CREATE INDEX IF NOT EXISTS IDX_TEST_DATA_LAT_LON ON TEST_DATA( LAT, LON );", NULL, NULL, NULL ); sqlite3_exec( connection, "COMMIT TRANSACTION;", NULL, NULL, NULL ); std::string insertRecordStatementStr( "INSERT INTO TEST_DATA ( GROUPID, USERID, CONTEXTID, TSTAMP, SOURCE, TYPE, SERVERID, NSIG, NNOI, LON, LAT, CONF ) VALUES ( NULL, ?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, NULL, NULL, NULL );" ); std::cout << "Preparing statement db." << std::endl; rc = sqlite3_prepare_v2( connection, insertRecordStatementStr.c_str(), -1, &statement, 0 ); if ( rc != SQLITE_OK ) { std::string errorMessage( sqlite3_errmsg( connection ) ); std::runtime_error ex( errorMessage ); sqlite3_finalize( statement ); sqlite3_close( connection ); statement = NULL; connection = NULL; std::cerr << errorMessage; throw ex; } std::map< int, std::pair< int, int > > userId2ContextIdCount; const char* sourceStr = "test_failure"; const char* types[] = { "type_01_1234567890_1234567890", "type_02_1234567890_1234567890", "type_03_1234567890_1234567890", "type_04_1234567890_1234567890", "type_05_1234567890_1234567890", "type_06_1234567890_1234567890", "type_07_1234567890_1234567890", "type_08_1234567890_1234567890", "type_09_1234567890_1234567890", "type_10_1234567890_1234567890" }; std::cout << "Inserting into db." << std::endl; sqlite3_exec( connection, "BEGIN IMMEDIATE TRANSACTION;", NULL, NULL, NULL ); for ( int recIdx = 0; recIdx <nRecords; ++recIdx ) { int userId = std::rand() % 25000; int contextId = 0; std::map< int, std::pair< int, int > >::iterator ui2ciIter = userId2ContextIdCount.find( userId ); if ( ui2ciIter != userId2ContextIdCount.end() ) { if ( ui2ciIter->second.second > ( 50 + ( std::rand() % 1000 ) ) ) { contextId = ui2ciIter->second.first + 1; ui2ciIter->second = std::make_pair< int, int >( contextId, 1 ); } else { ui2ciIter->second.second += 1; } } else { contextId = 1; userId2ContextIdCount[ userId ] = std::make_pair< int, int >( contextId, 1 ); } __int64 tstamp = std::time( NULL ); tstamp *= 1000; tstamp += ( std::rand() % 1000 ); tstamp += ( ( std::rand() % 900000 ) - 450000 ); int typeIdx = std::rand() % 10; int serverId = ( recIdx + std::rand() % 1000 ) % 1000; int nSig = std::rand() % 10; int nNoi = nSig; sqlite3_bind_int( statement, 1, userId ); sqlite3_bind_int( statement, 2, contextId ); sqlite3_bind_int64( statement, 3, tstamp ); sqlite3_bind_text( statement, 4, sourceStr, 0, NULL ); sqlite3_bind_text( statement, 5, types[ typeIdx ], 0, NULL ) ; sqlite3_bind_int( statement, 6, serverId ); sqlite3_bind_int( statement, 7, nSig ); sqlite3_bind_int( statement, 8, nNoi ); rc = sqlite3_step( statement ); if ( rc != SQLITE_DONE ) { std::string errorMessage( sqlite3_errmsg( connection ) ); std::runtime_error ex( errorMessage ); sqlite3_exec( connection, "ROLLBACK;", NULL, NULL, NULL ); sqlite3_finalize( statement ); sqlite3_close( connection ); statement = NULL; connection = NULL; std::cerr << errorMessage; throw ex; } sqlite3_reset( statement ); } sqlite3_exec( connection, "COMMIT TRANSACTION;", NULL, NULL, NULL ); sqlite3_finalize( statement ); sqlite3_close( connection ); return 0; } ----------end main.cpp-------------- -----Original Message----- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Victor Morales-Duarte Sent: Wednesday, August 04, 2010 2:19 PM To: sqlite-users@sqlite.org Subject: [sqlite] Process memory space exhausted in 3.7.0 Hello, The windows desktop application that I maintain uses sqlite for some of its storage. The data volume that we must handle has increased dramatically over the past 6 months and as it was to be expected update performance has degraded accordingly. Because of that, I was very quick to jump onto 3.7 when I read that WAL could be selected with it. The speed improvements when updating data are indeed very noticeable when running the application on my laptop's drive (3x faster) although not so much when running on a fast SSD connected to it via ESATA (only about 20% extra speed); I guess that the different ratio of improvement was to be expected given the access characteristics of each. Overall, I have to say that I believe that WAL was a great addition. Unfortunately, I've encountered what could potentially be considered a big problem. When I run a very large update the process space for the application seems to be exhausted. The way that it manifested itself at first was that there would be a disc I/O error, but because I test the application while running perfmon.exe on win xp sp3 to monitor the IO read bytes/sec, IO write bytes/sec, processor time and virtual bytes I noticed that the virtual bytes were at the 2GB max process space limit when the disc I/O error occurred. In order to rule out the possibility that I was doing something wrong, I decided to test a similar update using the sqlite3.exe CLI. During the update, what the application will do is it will iterate over all the records in a table in a specific order assigning a pair of integers to two columns (both initially null) of each record, based on domain specific rules; accordingly, the test with the CLI is the opposite operation; I take a db file that is about 1.5 GB in size, with over 3.7 million records in the table that needs to be updated and then I proceed to assign null to one of the columns for all records. After some time of working, the virtual bytes (as reported by perfmon) hit the max process space and the disk I/O error is reported. At that time, the wal file is over 5.5 GB in size and the shm file is over 10MB in size. My initial guess is that there is a problem memory mapping files. I wish that I could make the db available for testing but the data contained in it cannot be disclosed due to an NDA and the schema is proprietary information of my employer. First I need to finish a workaround for this (it seems that by closing and reopening the db connection, the situation improves somewhat) and then I will write a small piece of code that will create a dummy database large enough that the error can be reproduced in it so that I can post it in a reply to this email. Thank you!!! Victor _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users