I've been working on cleaning up some assertion failures in 2.8.15 which occur when sqlite fails a write in the middle of a transaction, due to the disk being full. My testing methodology is having the database in a small tmpfs filesystem, running transactions continuously against it, and using/freeing up the free space at random intervals. When my program tips over, I analyze the crash.
There are two asserts which seem to have simple fixes: Assertion failed: pPg->nRef==0 || pPg->pgno==1, file src/pager.c, line 570 Assertion failed: sqlitepager_iswriteable(pPage), file src/btree.c, line 642 Caused by a bogus assertion in pager_playback_one_page() and a missing return code check in balance(), respectively. In both cases, the sqlite3 branch already has the same changes, and the same logic seems to apply for 2.8.15. With those fixed, I'm now working on a problem in fileBtreeDelete(). The code is: 2688 static int fileBtreeDelete(BtCursor *pCur){ *A*2689 MemPage *pPage = pCur->pPage; ... 2695 assert( pPage->isInit ); ... *B*2710 if( checkReadLocks(pCur) ){ 2711 return SQLITE_LOCKED; /* The table pCur points to has a read lock */ 2712 } 2713 rc = sqlitepager_write(pPage); 2714 if( rc ) return rc; 2715 pCell = pPage->apCell[pCur->idx]; 2716 pgnoChild = SWAB32(pBt, pCell->h.leftChild); *C*2717 clearCell(pBt, pCell); 2718 if( pgnoChild ){ ... 2726 BtCursor leafCur; 2727 Cell *pNext; 2728 int szNext; 2729 int notUsed; 2730 getTempCursor(pCur, &leafCur); 2731 rc = fileBtreeNext(&leafCur, ¬Used); 2732 if( rc!=SQLITE_OK ){ 2733 if( rc!=SQLITE_NOMEM ) rc = SQLITE_CORRUPT; 2734 return rc; 2735 } 2736 rc = sqlitepager_write(leafCur.pPage); 2737 if( rc ) return rc; *D*2738 dropCell(pBt, pPage, pCur->idx, cellSize(pBt, pCell)); *C* is a missing check of a return code. Once fixed, I get an assertion in *D*: Assertion failed: idx>=0 && idx<pPage->nCell, file src/btree.c, line 2036 pPage's "nCell" and "isInit" fields are zero. with a stack trace of: ... fedfafd8 libc.so.1`_assert+0x64() fedfb238 dropCell+0xcc() fedfb298 fileBtreeDelete+0x254() fedfb320 sqliteVdbeExec+0x3f88() fedfba70 sqlite_step+0x6c() fedfbad0 sqlite_exec+0xb0() The page tripping the assertion has 'isInit' and 'nCell' == 0. I believe this is because *B* can change pCur->pPage (via moveToRoot()), and release the old page, making *A*'s assignment stale. I'm not familiar enough with how the pager works to know the right way to fix this (or even if I'm on the right track). The two approaches that come to mind are: 1. holding the page at the top of the function, and releasing it as we leave. 2. reloading pCur->pPage after *B*. I understand that sqlite3 is where the major development effort is now, but I'd appreciate any insight you can give on the appropriate fix for this. I've attached a patch against 2.5.18 which covers the two asserts at the top, plus *C* from above. Thanks for your time, - jonathan