[sqlite] Segfault during FTS index creation from huge data

Dan Kennedy Tue, 19 May 2015 16:40:38 +0700

On 05/19/2015 03:35 PM, Artem wrote:
> Hi!
>
> And what about result of our conversation?
> Can developers increase this limitations for using all memory that user
> have?


Hi Artem,

The conclusion was that although the first problem encountered is the 
massive allocation that FTS tries to make, fixing that won't actually 
improve things much. SQLite is hard-coded to limit the size of blobs to 
a bit under 1GiB. So even if we could merge these big doclists in 
memory, we're pretty close to the limit of what can be stored in the 
database anyway.

The 1GiB limit can be raised to 2GiB by recompiling SQLite with 
different options. But any further than that would require a redesign of 
the file-format.

So to fix this, we would need to change the format FTS4 uses to store 
the index within the database. Some sort of extension to allow it to 
spread the largest doclists across two or more database records. And 
unfortunately I suspect that will remain a low priority for the 
foreseeable future.

I've tested FTS5 with really large databases, and it seems to work. It's 
not actually released yet though.

Regards,
Dan.





>
>> One, you should remove sqlite-users at sqlite.org from your To list. I keep
>> bouncing email when I reply to you. Not a big deal, just an FYI.
>> Two:
>> On Sun, May 3, 2015 at 2:13 PM, James K. Lowden <jklowden at schemamania.org>
>> wrote:
>>> On Thu, 30 Apr 2015 12:47:57 -0600
>>> Scott Robison <scott at casaderobison.com> wrote:
>>>
>>>> Perhaps you are correct and "sigsegv" is not the literal signal that
>>>> is triggered in this case. I don't care, really. The fact is that an
>>>> apparently valid pointer was returned from a memory allocation
>>>> function yet can result in an invalid access for whatever reason (out
>>>> of memory, in this case). The Linux OOM killer may kill the offending
>>>> process (which is what one would expect, but one would also expect
>>>> malloc to return null, so we already know not to expect the
>>>> expected). Or it may kill some other process which has done nothing
>>>> wrong! Sure, the OS is protecting the two processes address space
>>>> from one another, but it seems to me that if one process can kill
>>>> another process, there is a problem.
>>> I have no argument with you, Scott.  It's neither the first nor last
>>> thing Linix implemented in the name of efficiency that undid what
>>> previously were guarantees.  My only angels-on-the-head-of-a-pin
>>> argument is that the OOM-killer doesn't invalidate malloc-returned
>>> pointers in particular.  It sweeps with a much broader brush, you might
>>> say.   ;-)
>>>
>> Okay, I think I see what you're saying, though there seem to be a number of
>> anecdotes online about people who do get a sigsegv from some simple memory
>> allocation strategies (designed to allocate all available memory). I would
>> not discount the possibility of a sigsegv, but agree that it is probably
>> not supposed to happen given the way optimistic memory allocation is
>> claimed to work.
>
>>> SIGSEGV *is* significant to the OP because it doesn't signify heap
>>> exhaustion.  If that signal was triggered in the heap, it indicates
>>> heap corruption.  If it was triggered in the stack, it suggests the
>>> stack might been exhausted, perhaps before a pure OOM condition was
>>> reached.
>>>
>> The code I was referring to in earlier posts performed a realloc. I wonder
>> if perhaps there is a corner case in there. Growing a block potentially
>> means moving a block, so if the library was copying from a previously
>> allocated block to a new block, maybe that could result in a segfault?
>> Or maybe another explanation could be that some other library other piece
>> of code replaced the default malloc-family functions.
>
>
>

[sqlite] Segfault during FTS index creation from huge data

Reply via email to