On 04/30/2015 07:41 AM, Scott Robison wrote: > > I wrote up some test code today that definitely forces a SQLITE_NOMEM > condition, and there are no leaks (though there are a couple really large > outstanding allocations to force an error without needing to insert > gigabytes of data). > > In thinking about what Dan wrote last night (many short word matches) I > decided that maybe my supposition was wrong and it wasn't the number of > relatively unique words in my old data set. Using the FTS4AUX virtual > table, I recall looking at the info in the DB at the time and seeing a huge > number of words with 1 or very few matches, and assumed that was the > problem in that data set. However, given the random nature of that data > set, and the fact that there are only 26 single letter "words" and 676 > double letter "words" (and so on), I could have easily missed the > relatively few rows of data that had very large numbers of docids / > occurrences in the aux table output. > > My test app goes to the other extreme. It inserts as many rows as possible > consisting of the single letter word "a" 256 times, and in my case, it > fails after 1,052,641 rows were inserted (1,048,576 rows were committed). > > In any case, my memory of the "merging" of things was correct, though the > precise location may not have been. > > Notes: > > 1. Depending on how much memory pressure I put on the system by > pre-allocating even more big chunks of memory, the NOMEM error moves around > a bit. I've seen it happen in a malloc as well. > > 2. The reality is that FTS was designed around certain assumptions, and > these extra large data sets don't fit into those cases. In my case, the > only time I've seen errors was due to synthetic / unrealistic test data. > The exponential growth related to segment directory merges seems to dictate > that eventually, after a lot of inserts / updates, the data structures are > going to get quite large.
Thanks for doing this. I'll look at it properly later on today to see if the results suggest anything we can do. I'm thinking there's another problem though. At some point soon, we run into this: https://www.sqlite.org/limits.html#max_length The default limit is 10^9 bytes. The fully-merged doclist generated for the term "a" in the above is probably around 256MiB in size. So if we could successfully merge it in memory, it could be inserted into the database. However, once you get up over a billion records there might be doclists for common terms that exceed this limit. Dan. > > 3. One possible change that might have an impact for Artem: right now the > FTS_MERGE_COUNT is set to 16. 16 also seems to be hard coded in FTS in a > few places, though I could be mistaken and the instances of "16" as a magic > number could be coincidence. Regardless, I wonder if a different value of > FTS_MERGE_COUNT might tweak the system so that it takes a lot longer for > Artem to encounter the problem. > > Finally, the details: > > In this exact case, the call stack looks as follows at the time of the > SQLITE_NOMEM error code: > > sqlite3MemRealloc, line 17109, sqlite3-1.c > sqlite3Realloc, line 20996, sqlite3-1.c > sqlite3_realloc, line 21022, sqlite3-1.c > sqlite3Fts3SegReaderStep, line 2946, sqlite3-6.c (attempting to realloc to > 135,265,788 bytes) > fts3SegmentMerge, line 3214, sqlite3-6.c > fts3AllocateSegdirIdx, line 1150, sqlite3-6.c > fts3SegmentMerge, line 3199, sqlite3-6.c > fts3AllocateSegdirIdx, line 1150, sqlite3-6.c > fts3SegmentMerge, line 3199, sqlite3-6.c > sqlite3Fts3PendingTermsFlush, line 3252, sqlite3-6.c > fts3PendingTermsDocid, line 878, sqlite3-6.c > sqlite3Fts3UpdateMethod, line 5619, sqlite3-6.c > fts3UpdateMethod, line 21701, sqlite3-5.c > sqlite3VdbeExec, line 24064, sqlite3-3.c > sqlite3Step, line 16367, sqlite3-3.c > sqlite3_step, line 16434, sqlite3-3.c > main, line 68: if (chk(sqlite3_step(stmt)) != SQLITE_DONE) > > Here is my test code (C++, compiled with Visual C++ 2010, though I'm quite > certain that is not the problem): > > #include <iostream> > #include <sstream> > #include <string> > #include <vector> > > #include "sqlite3.h" > > sqlite3* db = nullptr; > > int chk(int errcode) > { > if ((errcode > 0) && (errcode < 100)) > { > int ext_errcode = sqlite3_extended_errcode(db); > const char* p = sqlite3_errmsg(db); > if (!p || !*p) > p = "{missing errmsg}"; > std::ostringstream oss; > oss << ext_errcode << '-' << p; > throw oss.str(); > } > > return errcode; > } > > int main() > { > unsigned long long ull = 0; > > // allocate a bunch of memory to put pressure on malloc > std::vector<char*> allocs; > allocs.push_back(new char[1024*1024*1024]); > allocs.push_back(new char[256*1024*1024]); > //allocs.push_back(new char[128*1024*1024]); > //allocs.push_back(new char[64*1024*1024]); > > try > { > chk(sqlite3_open("test.db", &db)); > chk(sqlite3_exec(db, "BEGIN", nullptr, nullptr, nullptr)); > chk(sqlite3_exec(db, "CREATE VIRTUAL TABLE IF NOT EXISTS data USING > fts4();", nullptr, nullptr, nullptr)); > > sqlite3_stmt* stmt; > chk(sqlite3_prepare_v2(db, "INSERT INTO data VALUES (?1)", -1, &stmt, > nullptr)); > > static const char p[] = > "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a" > "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a" > "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a" > "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a" > "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a" > "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a" > "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a" > "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "; > > chk(sqlite3_bind_text(stmt, 1, p, -1, SQLITE_STATIC)); > > do { > > if ((ull & 0xFFFF) == 0) > { > std::cout << '\r' << ull; > chk(sqlite3_exec(db, "COMMIT", nullptr, nullptr, nullptr)); > chk(sqlite3_exec(db, "BEGIN", nullptr, nullptr, nullptr)); > } > > chk(sqlite3_reset(stmt)); > if (chk(sqlite3_step(stmt)) != SQLITE_DONE) > throw std::string("expected SQLITE_DONE"); > > } while (++ull != 0); > > chk(sqlite3_finalize(stmt)); > > chk(sqlite3_exec(db, "COMMIT", nullptr, nullptr, nullptr)); > > chk(sqlite3_close(db)); > } > catch (const std::string& x) > { > std::cout << std::endl << std::endl; > std::cout << "caught exception: " << x << std::endl; > std::cout << "ull = " << ull << std::endl; > } > > while (!allocs.empty()) > { > delete [] allocs.back(); > allocs.pop_back(); > } > > return 0; > } > > > -- > Scott Robison > _______________________________________________ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users