On 09/02/2011 09:24 PM, Simon Slavin wrote:

On 2 Sep 2011, at 10:04am, Filip Navara wrote:

The time to create an index on my 266 Mb experimental database is more
than 9 minutes. The database is available at
http://www.emclient.com/temp/mail_index.zip and the command I use to
create the index is

  CREATE INDEX "idx_MailAddresses_address" ON "MailAddresses" ("type",
"address", "parentId");

I had run the shell under profiler

Strangely, on my Mac running the shell tool provided with OS X 10.7.1, SQLite 
3.7.5, there seems to be a problem.  It's still going after more than 2 hours.

Loading your database and running your CREATE INDEX command, the application 
only seems to be using about 1% of one of my CPUs.  I looked to see if it was 
i/o bound instead of CPU bound but it seems only to be reading 320KB/s and my 
computer can handle a lot more than that.  (All above figures from Activity 
Monitor.)

We were just wondering a half hour ago how long this would
take with 3.7.7. Thanks!

Released versions of SQLite build an index by inserting
all values from the indexed column(s) in whatever order
they appear in the table (i.e. unsorted order) into a new
b-tree. This is fine if the index b-tree you are constructing
fits in the cache.

If it doesn't fit in the cache you have a problem. Each
time you go to insert a new entry into the b-tree you have
to find the leaf page that the new entry will be added to.
Since your b-tree doesn't fit in the cache, odds are that
this means reading the page from the file-system. And since
you are inserting in arbitrary order, the page could be
anywhere in the database (or WAL) file. In the worst case,
if your page is not cached in OS memory, you may even have
to shift the disk arm to get at it. Way slow.

The result is that reading data from disk becomes the
bottleneck when writing unsorted values to a b-tree. Hence
your 1% CPU measurement.

The new version uses a merge-sort to sort all the index
entries before it inserts them into the b-tree. This way
it doesn't matter if your b-tree is larger than the cache,
as you are always inserting into the right-most leaf node.
No need to go searching through the file-system/disk for
pages while building the b-tree.

Dan.





_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to