Bug Report
==========
SQlite version - 3.19.3
Platform Linux 64 Bit.

We are seeing crashes in our application which uses SQLite to store JOB
information where JOB is Application specific activities. The crash always
has following stack trace:

#0  0x0000003c000325e5 in raise () from ./libs/lib64/libc.so.6
#1  0x0000003c00033dc5 in abort () from ./libs/lib64/libc.so.6
#2  0x0000003c000704f7 in __libc_message () from ./libs/lib64/libc.so.6
#3  0x0000003c00075f3e in malloc_printerr () from ./libs/lib64/libc.so.6
#4  0x0000003c00078d8d in _int_free () from ./libs/lib64/libc.so.6
#5  0x000000000058eeca in sqlite3MemFree (pPrior=0x7f69989ff400)
    at sqlite3.c:20817
#6  0x000000000058fc87 in sqlite3_free (p=0x7f69989ff400) at sqlite3.c:24674
#7  0x000000000059d603 in pcache1Free (p=0x7f69989ff400) at sqlite3.c:45379
#8  0x000000000059d75c in pcache1FreePage (p=0x7f6998a01498) at
sqlite3.c:45463
#9  0x000000000059db8c in pcache1TruncateUnsafe (pCache=0x7f6b20010af8,
    iLimit=1) at sqlite3.c:45668
#10 0x000000000059e45c in pcache1Truncate (p=0x7f6b20010af8, iLimit=1)
    at sqlite3.c:46138
#11 0x000000000059cd21 in sqlite3PcacheTruncate (pCache=0x7f6b200100d8,
pgno=0)
    at sqlite3.c:44797
#12 0x000000000059cd5c in sqlite3PcacheClear (pCache=0x7f6b200100d8)
    at sqlite3.c:44814
#13 0x000000000059fa98 in pager_reset (pPager=0x7f6b2000ffa8)
    at sqlite3.c:48649
#14 0x00000000005a1228 in pagerBeginReadTransaction (pPager=0x7f6b2000ffa8)
    at sqlite3.c:50140
#15 0x00000000005a3546 in sqlite3PagerSharedLock (pPager=0x7f6b2000ffa8)
    at sqlite3.c:52225
#16 0x00000000005ad308 in lockBtree (pBt=0x7f6b2000fae8) at sqlite3.c:61943
#17 0x00000000005adaf8 in sqlite3BtreeBeginTrans (p=0x7f6b2000fa88,
wrflag=0)
    at sqlite3.c:62308
#18 0x00000000005c9267 in sqlite3VdbeExec (p=0x7f6b200512a8) at
sqlite3.c:81523
#19 0x00000000005c2968 in sqlite3Step (p=0x7f6b200512a8) at sqlite3.c:76699
#20 0x00000000005c2afe in sqlite3_step (pStmt=0x7f6b200512a8)
    at sqlite3.c:76760

We have two handles to database - RO and RW. As per
https://sqlite.org/isolation.html, we use RO handle for reads to avoid
reading partial transactions.
The crash always happen with RO handle while performing query to get active
JOBS.

I did some code analysis and dump analysis and here is my findings.

I followed the following comments and dumped the contents of data
structrues:
** A Page cache line looks like this:
**
**  -------------------------------------------------------------
**  |  database page content   |  PgHdr1  |  MemPage  |  PgHdr  |
**  -------------------------------------------------------------

(gdb) p *pCache
$1 = {pGroup = 0x7f754c00eea8, szPage = 8192, szExtra = 208, szAlloc = 8456,
  bPurgeable = 1, nMin = 10, nMax = 131072, n90pct = 117964, iMaxKey =
47126,
  nRecyclable = 2523, nPage = 2523, nHash = 65536, apHash = 0x7f6f8d4db858,
  pFree = 0x7f755106e750, pBulk = 0x7f755105e018}

(gdb) p *(PgHdr*)0x7f6998a014d0
$41 = {pPage = 0x7f6998a01498, pData = 0x7f69989ff498,
  pExtra = 0x7f6998a01518, pDirty = 0x0, pPager = 0x7f6b2000ffa8,
  pgno = 19529, flags = 1, nRef = 0, pCache = 0x7f6b200100d8,
  pDirtyNext = 0x0, pDirtyPrev = 0x0}


(gdb) p *(MemPage*)0x7f6998a01518
$40 = {isInit = 1 '\001', bBusy = 0 '\000', intKey = 1 '\001',
  intKeyLeaf = 1 '\001', pgno = 19529, leaf = 1 '\001', hdrOffset = 0
'\000',
  childPtrSize = 0 '\000', max1bytePayload = 127 '\177', nOverflow = 0
'\000',
  maxLocal = 8157, minLocal = 1003, cellOffset = 8, nFree = 1938, nCell = 7,
  maskPage = 8191, aiOvfl = {0, 0, 0, 0}, apOvfl = {0x0, 0x0, 0x0, 0x0},
  pBt = 0x7f6b2000fae8, aData = 0x7f69989ff498 "\r",
  aDataEnd = 0x7f6998a01498 "",
  aCellIdx = 0x7f69989ff4a0 "\027'\025*\023Z\021L\016\213\f\272\a\250",
  aDataOfst = 0x7f69989ff498 "\r", pDbPage = 0x7f6998a014d0,
  xCellSize = 0x5aa7f3 <cellSizePtr>,
  xParseCell = 0x5aa4da <btreeParseCellPtr>}


(gdb) p *p
$29 = {page = {pBuf = 0x7f69989ff400, pExtra = 0x7f6998a014d0}, iKey =
19529,
  isPinned = 1 '\001', isBulkLocal = 0 '\000', isAnchor = 0 '\000',
  pNext = 0x0, pCache = 0x7f6b20010af8, pLruNext = 0x0, pLruPrev = 0x0}

The (pBuf - 8 bytes) contains the size but there is no size in it:

0x7f69989ff3f8: 0x01 0x00 0x01 0x01 0x4c 0x4d 0x00 0x00 'LM'
0x7f69989ff400: 0x01 0x00 0x00 0x7f 0x00 0x00 0xdd 0x1f Ý' <========= Crash
address

Also, looking at previous cache, we can see the following:

0x7f69989ff360: 0x65 0x63 0x6f 0x76 0x65 0x72 0x20 0x70 'ecover p'
0x7f69989ff368: 0x69 0x70 0x65 0x20 0x74 0x6f 0x20 0x73 'ipe to s'

0x7f69989ff370: 0x61 0x76 0x65 0x20 0x6a 0x6f 0x62 0x30 'ave job0' +
szextra(208 bytes) will cause page to overflow into 0x7f69989ff3f8

0x7f69989ff378: 0x78 0xd3 0x9f 0x98 0x69 0x7f 0x00 0x00 'xÓ'
0x7f69989ff380: 0xb0 0xf3 0x9f 0x98 0x69 0x7f 0x00 0x00 '°ó'
0x7f69989ff388: 0x4c 0x4d 0x00 0x00 0x00 0x00 0x00 0x00 'LM'
0x7f69989ff390: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ''
0x7f69989ff398: 0xf8 0x0a 0x01 0x20 0x6b 0x7f 0x00 0x00 'ø
 '
0x7f69989ff3a0: 0xb8 0x5d 0x8f 0x98 0x69 0x7f 0x00 0x00 '¸]'
0x7f69989ff3a8: 0xa8 0xf2 0xba 0x24 0x6b 0x7f 0x00 0x00 '¨òº$'
0x7f69989ff3b0: 0x78 0xf3 0x9f 0x98 0x69 0x7f 0x00 0x00 'xó' ----
0x7f69989ff378 - 0x7f69989fd378 = 8192
0x7f69989ff3b8: 0x78 0xd3 0x9f 0x98 0x69 0x7f 0x00 0x00 'xÓ' ----
0x7f69989fd378 - buf pointer
0x7f69989ff3c0: 0xf8 0xf3 0x9f 0x98 0x69 0x7f 0x00 0x00 'øó' ----
0x7f69989ff3f8
0x7f69989ff3c8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ''
0x7f69989ff3d0: 0xa8 0xff 0x00 0x20 0x6b 0x7f 0x00 0x00 '¨ÿ '
0x7f69989ff3d8: 0x4c 0x4d 0x00 0x00 0x01 0x00 0x00 0x00 'LM'
0x7f69989ff3e0: 0xd8 0x00 0x01 0x20 0x6b 0x7f 0x00 0x00 'Ø '
0x7f69989ff3e8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ''
0x7f69989ff3f0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ''
0x7f69989ff3f8: 0x01 0x00 0x01 0x01 0x4c 0x4d 0x00 0x00 'LM'
0x7f69989ff400: 0x01 0x00 0x00 0x7f 0x00 0x00 0xdd 0x1f Ý' <========= Crash
address
0x7f69989ff408: 0xeb 0x03 0x08 0x00 0xc6 0x02 0x09 0x00 'Æ      '
0x7f69989ff410: 0xff 0x1f 0x00 0x00 0x00 0x00 0x00 0x00 'ÿ'
0x7f69989ff418: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ''
0x7f69989ff420: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ''

So 0x7f69989ff400 does not look like a VALID buffer

In function pcacheFetchFinishWithInit(), we set

pPgHdr->pData = pPage->pBuf;

So I took pData pointer and found to be valid as (pData-8 byte) gave me
correct size of allocation:
pData = 0x7f69989ff498

0x7f69989ff490: 0x08 0x21 0x00 0x00 0x00 0x00 0x00 0x00 !' <====== 8456 =
szAlloc
0x7f69989ff498: 0x0d 0x00 0x00 0x00 0x07 0x07 0xa8 0x00 '
0x7f69989ff4a0: 0x17 0x27 0x15 0x2a 0x13 0x5a 0x11 0x4c ''*ZL'

Interesting facts:
1) Whenever core happens, the size of core file is 20 GB
2) From the OS logs, it looks like OS is LOW on resources. So this happens
when system is under stress. I think due to that this crash is not easily
reproducible.
3) We have tried several ways to reproduce but are not able to reproduce.

Any suggestions on reproducing this issue is welcome.
Please let me know if this is known issue (did not find any reference) or
need any further details from the core.

Thanks
Ashish
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to