Re: [sqlite] Process memory space exhausted in 3.7.0

Victor Morales-Duarte Mon, 09 Aug 2010 22:41:36 -0700

As it turns out, I can reproduce the failure using a single huge insert.
The code that I'm including below compiles under bcc32 from
Embarcadero's C++ Builder 2007 and cl from vs2005. Since it's more
likely that people have MS compilers available, the compilation line
having this source file and the sqlite3 3.7.0 amalgamation files in the
same folder is:


cl -EHsc -Fefail.exe main.cpp sqlite3.c

You can then invoke fail.exe with a single command line argument of
80000000 like this:

Fail.exe 80000000

The source for the executable is listed below. If you're wondering about
why the numbers being inserted are more complicated than need be, it's
because I just wanted the table and indices to look as much as possible
like the actual data that our application stores in sqlite because I had
not realized that the failure could be reproduced with simply inserting.
Beware that there is no handling of incorrect command line arguments.

If you monitor this executable run with perfmon and look at its virtual
bytes, you'll see them hit 2GB and then the next time the insert
statement is stepped, it fails with an I/O disc error. I understand that
this may not be the intended use case of sqlite, but I don't remember
reading anything anywhere that forces you limit the size of insert
operations, so a priori, there should not be any reason why simply
inserting a lot of rows in a single transaction should make the db fail.
If you break up the insert into chunks
_and_close_the_connection_between_chunks_ then the error does not occur.
This is even more bothersome because there is also no documentation
saying that there would be a limit on the number of operations that can
be performed on a single connection.

Should I open a bug for this?

Thanks a lot!

Victor


--------start main.cpp---------
#include <iostream>
#include <sstream>
#include <string>
#include <exception>
#include <cstdlib>
#include <ctime>
#include <map>

#include "sqlite3.h"

int main( int argc, char* argv[] )
{
   // boost::lexical_cast where art thou?
   int nRecords = 4000000;
   if ( argc > 1 )
   {
      std::stringstream sstr;
      sstr << argv[1];
      sstr >> nRecords;
   }

   sqlite3* connection = NULL;
   sqlite3_stmt* statement = NULL;
   
   // Open db
   std::cout << "Openning db." << std::endl;
   int rc = sqlite3_open( "./test.db", &connection );
   if ( rc )
   {
      std::string errorMessage( sqlite3_errmsg( connection ) );
      std::runtime_error ex( errorMessage );
      sqlite3_close( connection );
      connection = NULL;
      std::cerr << errorMessage;
      throw ex;
   }
   else
   {
      int theTimeout = 5000;
      sqlite3_exec( connection, 
                    "PRAGMA page_size = 4096; PRAGMA foreign_keys = 1;
PRAGMA cache_size = 20000; PRAGMA journal_mode=WAL;", 
                    NULL, NULL, NULL );
      sqlite3_busy_timeout( connection, theTimeout );
   }

   // Schema
   std::cout << "Creating schema." << std::endl;
   sqlite3_exec( connection, "BEGIN IMMEDIATE TRANSACTION;;", NULL,
NULL, NULL );
   sqlite3_exec( connection, "CREATE TABLE IF NOT EXISTS TEST_DATA( ID
INTEGER PRIMARY KEY AUTOINCREMENT, GROUPID INTEGER, USERID INTEGER,
CONTEXTID INTEGER, TSTAMP INTEGER, SOURCE STRING, TYPE STRING, SERVERID
INTEGER, NSIG INTEGER, NNOI INTEGER, LON FLOAT, LAT FLOAT, CONF FLOAT
);", NULL, NULL, NULL );
   sqlite3_exec( connection, "CREATE INDEX IF NOT EXISTS
IDX_TEST_DATA_ID                    ON TEST_DATA( ID );",
NULL, NULL, NULL );
   sqlite3_exec( connection, "CREATE INDEX IF NOT EXISTS
IDX_TEST_DATA_USERID_GROUPID_TSTAMP ON TEST_DATA( USERID, GROUPID,
TSTAMP);", NULL, NULL, NULL );
   sqlite3_exec( connection, "CREATE INDEX IF NOT EXISTS
IDX_TEST_DATA_GROUPID_TSTAMP        ON TEST_DATA( GROUPID, TSTAMP );",
NULL, NULL, NULL );
   sqlite3_exec( connection, "CREATE INDEX IF NOT EXISTS
IDX_TEST_DATA_CONTEXTID_TSTAMP      ON TEST_DATA( CONTEXTID, TSTAMP );",
NULL, NULL, NULL );
   sqlite3_exec( connection, "CREATE INDEX IF NOT EXISTS
IDX_TEST_DATA_LAT_LON               ON TEST_DATA( LAT, LON );",
NULL, NULL, NULL );
   sqlite3_exec( connection, "COMMIT TRANSACTION;",          NULL, NULL,
NULL );
   
   std::string insertRecordStatementStr( "INSERT INTO TEST_DATA (
GROUPID, USERID, CONTEXTID, TSTAMP, SOURCE, TYPE, SERVERID, NSIG, NNOI,
LON, LAT, CONF ) VALUES ( NULL, ?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, NULL,
NULL, NULL );" );

   std::cout << "Preparing statement db." << std::endl;
   rc = sqlite3_prepare_v2( connection,
insertRecordStatementStr.c_str(), -1, &statement, 0 );
   if ( rc != SQLITE_OK )
   {
      std::string errorMessage( sqlite3_errmsg( connection ) );
      std::runtime_error ex( errorMessage );
      sqlite3_finalize( statement );
      sqlite3_close( connection );
      statement = NULL;
      connection = NULL;
      std::cerr << errorMessage;
      throw ex;
   }

   std::map< int, std::pair< int, int > > userId2ContextIdCount;
   const char* sourceStr = "test_failure";
   const char* types[] = { "type_01_1234567890_1234567890", 
                           "type_02_1234567890_1234567890", 
                           "type_03_1234567890_1234567890",
                           "type_04_1234567890_1234567890",
                           "type_05_1234567890_1234567890",
                           "type_06_1234567890_1234567890",
                           "type_07_1234567890_1234567890",
                           "type_08_1234567890_1234567890",
                           "type_09_1234567890_1234567890",
                           "type_10_1234567890_1234567890" };
   
   std::cout << "Inserting into db." << std::endl;
   sqlite3_exec( connection, "BEGIN IMMEDIATE TRANSACTION;", NULL, NULL,
NULL );
   for ( int recIdx = 0; recIdx <nRecords; ++recIdx )
   {
      int userId = std::rand() % 25000;
      int contextId = 0;
      std::map< int, std::pair< int, int > >::iterator
         ui2ciIter = userId2ContextIdCount.find( userId );
      if ( ui2ciIter != userId2ContextIdCount.end() )
      {
         if ( ui2ciIter->second.second > ( 50 + ( std::rand() % 1000 ) )
)
         {
            contextId = ui2ciIter->second.first + 1;
            ui2ciIter->second = std::make_pair< int, int >( contextId, 1
);
         }
         else
         {
            ui2ciIter->second.second += 1;
         }
      }
      else
      {
         contextId = 1;
         userId2ContextIdCount[ userId ] = std::make_pair< int, int >(
contextId, 1 );
      }
      
      __int64 tstamp = std::time( NULL );
      tstamp *= 1000;
      tstamp += ( std::rand() % 1000 );
      tstamp += ( ( std::rand() % 900000 ) - 450000 );
      
      int typeIdx = std::rand() % 10;
      int serverId = ( recIdx + std::rand() % 1000 ) % 1000;
      int nSig = std::rand() % 10;
      int nNoi = nSig;
      
      sqlite3_bind_int( statement, 1, userId    ); 
      sqlite3_bind_int( statement, 2, contextId ); 

      sqlite3_bind_int64( statement, 3, tstamp ); 
      
      sqlite3_bind_text(  statement, 4, sourceStr,        0, NULL ); 
      sqlite3_bind_text(  statement, 5, types[ typeIdx ], 0, NULL ) ; 

      sqlite3_bind_int( statement, 6, serverId  ); 
      sqlite3_bind_int( statement, 7, nSig      ); 
      sqlite3_bind_int( statement, 8, nNoi      ); 

      rc = sqlite3_step( statement );
      if ( rc != SQLITE_DONE )
      {
         std::string errorMessage( sqlite3_errmsg( connection ) );
         std::runtime_error ex( errorMessage );
         sqlite3_exec( connection, "ROLLBACK;", NULL, NULL, NULL );
         sqlite3_finalize( statement );
         sqlite3_close( connection );
         statement = NULL;
         connection = NULL;
         std::cerr << errorMessage;
         throw ex;
      }
      sqlite3_reset( statement );
   }
   sqlite3_exec( connection, "COMMIT TRANSACTION;", NULL, NULL, NULL );
   
   sqlite3_finalize( statement );
   sqlite3_close( connection );
   return 0;
}
----------end main.cpp--------------

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Victor
Morales-Duarte
Sent: Wednesday, August 04, 2010 2:19 PM
To: [email protected]
Subject: [sqlite] Process memory space exhausted in 3.7.0

Hello,

 

The windows desktop application that I maintain uses sqlite for some of
its storage. The data volume that we must handle has increased
dramatically over the past 6 months and as it was to be expected update
performance has degraded accordingly. Because of that, I was very quick
to jump onto 3.7 when I read that WAL could be selected with it. The
speed improvements when updating data are indeed very noticeable when
running the application on my laptop's drive (3x faster) although not so
much when running on a fast SSD connected to it via ESATA (only about
20% extra speed);  I guess that the different ratio of improvement was
to be expected given the access characteristics of each. Overall, I have
to say that I believe that WAL was a great addition.

 

Unfortunately, I've encountered what could potentially be considered a
big problem. When I run a very large update the process space for the
application seems to be exhausted. The way that it manifested itself at
first was that there would be a disc I/O error, but because I test the
application while running perfmon.exe on win xp sp3 to monitor the IO
read bytes/sec, IO write bytes/sec, processor time and virtual bytes I
noticed that the virtual bytes were at the 2GB max process space limit
when the disc I/O error occurred.

 

In order to rule out the possibility that I was doing something wrong, I
decided to test a similar update using the sqlite3.exe CLI. During the
update, what the application will do is it will iterate over all the
records in a table in a specific order assigning a pair of integers to
two columns (both initially null) of each record, based on domain
specific rules; accordingly, the test with the CLI is the opposite
operation; I take a db file that is about 1.5 GB in size, with over 3.7
million records in the table that needs to be updated and then I proceed
to assign null to one of the columns for all records. After some time of
working, the virtual bytes (as reported by perfmon) hit the max process
space and the disk I/O error is  reported. At that time, the wal file is
over 5.5 GB in size and the shm file is over 10MB in size.

 

My initial guess is that there is a problem memory mapping files.

 

I wish that I could make the db available for testing but the data
contained in it cannot be disclosed due to an NDA and the schema is
proprietary information of my employer. First I need to finish a
workaround for this (it seems that by closing and reopening the db
connection, the situation improves somewhat) and then I will write a
small piece of code that will create a dummy database large enough that
the error can be reproduced in it so that I can post it in a reply to
this email. 

 

Thank you!!!

 

Victor

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Process memory space exhausted in 3.7.0

Reply via email to