On 10 sep. 2013, at 21:24, "E.Pasma" <pasm...@concepts.nl> wrote:

> Op 10 sep 2013, om 19:48 heeft Simon Slavin het volgende geschreven:
> 
>> 
>> On 10 Sep 2013, at 4:15pm, Harmen de Jong - CoachR Group B.V. 
>> <har...@coachr.com> wrote:
>> 
>>> That is something we suspected too. We already made some tests where we 
>>> timed the time needed for all memory allocations executed in the entire 
>>> operation. In total for the 10000 tables test this was somewhere around 25 
>>> msec. Since this is just a little overhead and the instructions as you 
>>> point out have a linear increasement this still does not explain the 
>>> polynomial increasement in preperation time.
>> 
>> Polynomial time would be explained if the software is scanning every item in 
>> a list of lists for each table mentioned.  So a guess about what's happening 
>> is that for each DELETE FROM instruction the software has to look up the 
>> entry being deleted, and to check that none of the items in any of the other 
>> tables refers to it.
>> 
>> This makes sense given how FOREIGN KEYs are implemented in SQLite: there is 
>> no master 'check-for-changes' table, but each table which has the foreign 
>> key has to be scanned.  It is irrelevant that there are no entries in any of 
>> these tables: the table (actually, the index) still has to be found so that 
>> SQLite can figure out that there are no entries in it.  Because SQLite 
>> hashes table names, rather than scanning a simple list, it is scanning a 
>> list of lists.  Hence, polynomial time.
>> 
>> I see that E. Pasma has posted that the long scanning time has been fixed in 
>> a later release.  Good.
>> 
>> Simon
> My suppositions that the time was spent in the execute step and that this has 
> been fixed in the new release appeared both wrong. Thus I may be wrong again 
> but I think to have an explanation now.
> It is as Simon guesses that a list of lists is being scanned. It is however 
> not the contents of tables being scanned, but the list of foreign key 
> constraints as is loaded in memory when a database is opened. When preparing 
> a DELETE statement,  the global list of FK's is scanned to see if the current 
> table is referred. This is the outer loop. If a referring FK is found and if 
> this has an ON DELETE clause, comes an inner loop on the same global list to 
> see if the referrer is referred to itself. In the case that every table has 
> such a constraint, as is the case here, the time becomes n * n. If I'm right 
> this is hard to fix and inherent to the representation of the database schema 
> in memory.
> 
> This also means that if you leave out the cascading delete from the 
> constraints the time becomes linear. Actually that is what I observed before 
> coming with above explanation. This was easy to check by extractingg the 
> schemas from the test databases and removing ON ...... CASCADE. Thanks for 
> making these database available.

What you are supposing makes sense. Actually we already have been searching for 
two loops that are related to each other since that would explain the 
polynomial, however thus far we cannot find them (hence we are not yet that 
familiar with the source code of SQLite). Could you point us to the code where 
these inner and outer loops are done?

The next challenge would of course be to find a solution. Actually we already 
found a similar optimization last week that had quite some impact with these 
many tables that we provided to the developers and have been added to the fork 
right now -;). Especially since these are quite exceptional use cases, there 
might be the chance to do some optimizations that seem useless for  the more 
normal usage of SQLite.

No thanks for making these databases available. We would like to thank every 
one of you that helps pointing us into the right direction in this thread.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to