I am debugging db corruption. After I get some corrupted db, I found that they 
all corrupted by writing null data.
So, I decide to add some check and dump call stackin the source code in order 
to find out who corrupts the db.


Here is the code I added in the source code.


int sqlite3CheckNullData(const unsigned char* data, const int length)
{
  const size_t* s = (const size_t*)data;
  const unsigned char* d = (const unsigned char*)data;
  int n = length/sizeof(size_t);
  int i;
  for (i = 0; i  n; i++) {
    if (s[i]!=0) {
      return 0;
    }
  }
  for (i = i*sizeof(size_t); ilength; i++) {
    if (d[i]!=0) {
      return 0;
    }
  }
  return 1;
}
static int unixWrite(
 sqlite3_file *id,
 const void *pBuf,
 int amt,
 sqlite3_int64 offset
){
 unixFile *pFile = (unixFile*)id;
 if (amt0sqlite3CheckNullData(pBuf, amt)) {
  SQLITE_KNOWN_ERROR(SQLITE_CORRUPT, "writing null data into %s from %d length 
%d", unixGetFilename(pFile-zPath), offset, amt);
 }
...
}

The code is simple. I check the data whether is all null in 
[sqlite3CheckNullData], and add a macro [SQLITE_KNOWN_ERROR], which is defined 
as [sqlite_log], to throw this error outside SQLite. Outside SQLite, I dump the 
call stack of all thread, and I got this:

0x195774000 + 113628   objc_msgSend (in libobjc.dylib) + 28
0x1000f8000 + 7781724   _ZL9LogSQLitePviPKc,WCDataBase.mm,line 81
0x1000f8000 + 2836888   sqlite3_vlog,printf.c,line 1023
0x1000f8000 + 2778664   sqlite3KnownError,main.c,line 3192
0x1000f8000 + 2554560   unixWrite,os_unix.c,line 3335
0x1000f8000 + 2821984   sqlite3WalCheckpoint,wal.c,line 1798
0x1000f8000 + 2819864   sqlite3WalClose,wal.c,line 1914
0x1000f8000 + 2529964   sqlite3PagerClose,pager.c,line 3995
0x1000f8000 + 2574152   sqlite3BtreeClose,btree.c,line 2516
0x1000f8000 + 2774444   sqlite3LeaveMutexAndCloseZombie,main.c,line 
10834297741736

0x1000f8000 + 2774220   sqlite3Close,main.c,line 1026


This is the only thread operating database. All other call stack of threads 
make no sense.
You can see the SQLite checkpointing. That is the reason why my database 
corrupt. And I have no idea how this happened even I checking the source code.


Here is some of my conclusion:
1. This checking null data also work for writing into WAL file, but there is no 
report that WAL is been written by null data.
2.Some rogue file descriptor may write the null data into WAL file. But, I have 
several db with the same problem. It?s a rare event that the rogue writter only 
write the null data into the WAL, not all other db files or normal files.
3. I guess it could be a problem of operating system. I work on iOS, but I have 
no any further idea.
4. It would happened in normal knee. But it could easily happen when the disk 
free space is low. I also haveno any further idea about this.


So, this is my confusion:
1. Does anyone have any idea about this?
2. What can I do to reserve this type of corruption?


Note that if a page of sqlite_master is been rewritten by null data, the 
[.dump] shell command will not work to repair the database.

Reply via email to