Hi sqlite experts,

I am currently incorporating the sqlite online backup API (version 3.8.0.2) 
into our product. I meet a tricky problem, where the sqlite3_backup_step hangs 
at acquiring the BTree’s lock of the destination file.

Specifically, I am backing up in-memory database to NFS disk file. The 
in-memory database is manipulated in the main thread (T1) while the back-up 
runs in another thread (T2). There are only 2 threads in the process and there 
is no other process existing in our product environment.

The stack trace I got in the T2 when the hanging happens is as following:
#0  0x0000003de740d654 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003de7408f4a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x0000003de7408e0c in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00002aaaaacc20f6 in pthreadMutexEnter (p=0x23084300) at sqlite3.c:18299
#4  0x00002aaaaacc1f31 in sqlite3_mutex_enter (p=0x23084300) at sqlite3.c:17812
#5  0x00002aaaaacd9e4f in lockBtreeMutex (p=0x2aaad8001040) at sqlite3.c:49323
#6  0x00002aaaaacd9f49 in sqlite3BtreeEnter (p=0x2aaad8001040) at 
sqlite3.c:49410
#7  0x00002aaaaacdd970 in sqlite3BtreeBeginTrans (p=0x2aaad8001040, wrflag=2) 
at sqlite3.c:52240
#8  0x00002aaaaace71b7 in sqlite3_backup_step (p=0x2aaad8000fc0, nPage=50) at 
sqlite3.c:58451
#9  0x00002aaace885b37 in ResDB::backup (this=0x2314c3f0, numBlocks=10, 
numPagesPerBlock=50, blockIntervalInMillisec=50) at dbEngine.cpp:659
#10 0x00002aaace977373 in ResDBBackupManager::process (this=0x23030880) at 
dbBackupManager.cpp:53
#11 0x00002aaace976e3a in DBBackupEventHandler::event (this=0x23040870, 
event=0x2aaad8000990) at dbBackupEventHandler.cpp:35

The sqlite3_backup_step hangs at the very first call and it cannot go any 
further so that nothing is actually back-uped and the file on-disk is still 
empty, which disappoints me very much.
-rw-rw-r-- 1 yhzhan cadence3    0 2016-07-19 06:59 Interactive.3.rdb

Take a look at the code of sqlite3_backup_step. It looks there are 2 locks for 
both source and destination database. One is for the database itself and the 
other is for the tables (Btree). The flow hangs at sqlite3BtreeBeginTrans

SQLITE_API int sqlite3_backup_step(sqlite3_backup *p, int nPage){
  sqlite3_mutex_enter(p->pSrcDb->mutex);
  sqlite3BtreeEnter(p->pSrc);
  if( p->pDestDb ){
    sqlite3_mutex_enter(p->pDestDb->mutex);
  }
  ……
    /* Lock the destination database, if it is not locked already. */
    if( SQLITE_OK==rc && p->bDestLocked==0
     && SQLITE_OK==(rc = sqlite3BtreeBeginTrans(p->pDest, 2))
    ){
      p->bDestLocked = 1;
      sqlite3BtreeGetMeta(p->pDest, BTREE_SCHEMA_VERSION, &p->iDestSchema);
    }

Also tried the latest version of sqlite 3.13.0.0, the problem remains.

My back-upcode is quite simple, almost the duplicate of online backup api 
example https://www.sqlite.org/backup.html

bool ResDB::backup(int numBlocks, int numPagesPerBlock, int 
blockIntervalInMillisec)
{
    if (!m_sqlBackupDestObj) {
        int rc = sqlite3_open(m_fileNameOnNFS.c_str(), &m_sqlBackupDestObj);
        Q_ASSERT(rc == SQLITE_OK);
    }

    if (!m_backupHandle) {
        m_backupHandle = sqlite3_backup_init(m_sqlBackupDestObj, "main", 
m_sqlObj, "main");
        Q_ASSERT(m_backupHandle);
    }

    bool finished = false;
    int i = 0;
    while (!finished && i < numBlocks) {
        while (true) {
            int rc = sqlite3_backup_step(m_backupHandle, numPagesPerBlock);
            if (rc == SQLITE_OK) {
                sqlite3_sleep(blockIntervalInMillisec);
                ++i;
                break;
            } else if (rc == SQLITE_DONE) {
                finished = true;
                break;
            } else if (rc == SQLITE_LOCKED) {
                sqlite3_sleep(blockIntervalInMillisec);
                continue;
            } else if (rc == SQLITE_BUSY) {
                sqlite3_sleep(blockIntervalInMillisec);
                continue;
            } else {
                finished = true;
                Q_ASSERT(false);
                break;
            }
        }
    }

    if (finished) {
        sqlite3_backup_finish(m_backupHandle);
        m_backupHandle = NULL;
        sqlite3_close(m_sqlBackupDestObj);
        m_sqlBackupDestObj = NULL;
    }

    return finished;
}

So I am wondering what factor will result in lockBtreeMutex hang? I am not sure 
whether there is any misuse of the online backup api. Currently I am clueless. 
Do you have any suggestions?
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to