I've been working on cleaning up some assertion failures in 2.8.15
which occur when sqlite fails a write in the middle of a transaction,
due to the disk being full.  My testing methodology is having the
database in a small tmpfs filesystem, running transactions continuously
against it, and using/freeing up the free space at random intervals.
When my program tips over, I analyze the crash.

There are two asserts which seem to have simple fixes:

Assertion failed: pPg->nRef==0 || pPg->pgno==1, file src/pager.c, line 570
Assertion failed: sqlitepager_iswriteable(pPage), file src/btree.c, line 642

Caused by a bogus assertion in pager_playback_one_page() and a
missing return code check in balance(), respectively.  In both
cases, the sqlite3 branch already has the same changes, and the same
logic seems to apply for 2.8.15.

With those fixed, I'm now working on a problem in fileBtreeDelete().  The code
is:

   2688 static int fileBtreeDelete(BtCursor *pCur){
*A*2689   MemPage *pPage = pCur->pPage;
...
   2695   assert( pPage->isInit );
...
*B*2710   if( checkReadLocks(pCur) ){
   2711     return SQLITE_LOCKED; /* The table pCur points to has a read lock */
   2712   }
   2713   rc = sqlitepager_write(pPage);
   2714   if( rc ) return rc;
   2715   pCell = pPage->apCell[pCur->idx];
   2716   pgnoChild = SWAB32(pBt, pCell->h.leftChild);
*C*2717   clearCell(pBt, pCell);
   2718   if( pgnoChild ){
...
   2726     BtCursor leafCur;
   2727     Cell *pNext;
   2728     int szNext;
   2729     int notUsed;
   2730     getTempCursor(pCur, &leafCur);
   2731     rc = fileBtreeNext(&leafCur, &notUsed);
   2732     if( rc!=SQLITE_OK ){
   2733       if( rc!=SQLITE_NOMEM ) rc = SQLITE_CORRUPT;
   2734       return rc;
   2735     }
   2736     rc = sqlitepager_write(leafCur.pPage);
   2737     if( rc ) return rc;
*D*2738     dropCell(pBt, pPage, pCur->idx, cellSize(pBt, pCell));

*C* is a missing check of a return code.  Once fixed, I get an assertion in
*D*:

Assertion failed: idx>=0 && idx<pPage->nCell, file src/btree.c, line 2036

pPage's "nCell" and "isInit" fields are zero.

with a stack trace of:

...
fedfafd8 libc.so.1`_assert+0x64()
fedfb238 dropCell+0xcc()
fedfb298 fileBtreeDelete+0x254()
fedfb320 sqliteVdbeExec+0x3f88()
fedfba70 sqlite_step+0x6c()
fedfbad0 sqlite_exec+0xb0()

The page tripping the assertion has 'isInit' and 'nCell' == 0.

I believe this is because *B* can change pCur->pPage (via moveToRoot()),
and release the old page, making *A*'s assignment stale.  I'm not familiar
enough with how the pager works to know the right way to fix this (or even if
I'm on the right track).  The two approaches that come to mind are:

1.  holding the page at the top of the function, and releasing it as we leave.
2.  reloading pCur->pPage after *B*.

I understand that sqlite3 is where the major development effort is
now, but I'd appreciate any insight you can give on the appropriate
fix for this.

I've attached a patch against 2.5.18 which covers the two asserts at the
top, plus *C* from above.

Thanks for your time,
- jonathan

Reply via email to