I admit I didn't think (or didn't even read in detail) about technical implementation.

This is an extract from analyzer:
*** Table AE_DATA ****************************************************

Percentage of total database..........  99.89%
Number of entries..................... 1030371
Bytes of storage consumed............. 67846144
Bytes of payload...................... 61186719    90.2%
Average payload per entry............. 59.38
Average unused bytes per entry........ 0.34
Average fanout........................ 752.00
Fragmentation.........................   0.35%
Maximum payload per entry............. 65
Entries that use overflow............. 0            0.0%
Index pages used...................... 11
Primary pages used.................... 8271
Overflow pages used................... 0
Total pages used...................... 8282
Unused bytes on index pages........... 15678       17.4%
Unused bytes on primary pages......... 337429       0.50%
Unused bytes on overflow pages........ 0
Unused bytes on all pages............. 353107       0.52%

So I understand that the 11 index pages are pure btree pages, but the leaves are actually in the ~8000 data pages. And it probably needs to visit (i.e. load) all data pages to count the leaves... Even if there would be some counter in the header of each page, it still needs to load the pages which is bad for IO...

BTW I found this by opening some file over network, which of course made everything worse. For my case (file format) the data is append (write) only, so max(rowid) works equally good. As a note, I actually HAVE the record count stored somewhere else but I had this query in a generic copy routine which was also used for some other small tables. I agree it's some kind of corner case, usually tables have some kind of indices. But in this case I need high speed, indices would bring performance down. Not that I really need, but I have to support specified data rates up to 100k records / second.
And I only access the data sequentially by rowid.

Just for the sake of discussion: I imagine some hacks to the btree to optimize this special case. The btree nodes could store the number of leaves just for the data pages (e.g. 0: unknown, >0 valid number); it would need to propagate up the info just until it reaches a parent in an index page. And it needs to update this info only when a node changes from leaf to having a child.

Thanks for all your time,
Gabriel
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to