I admit I didn't think (or didn't even read in detail) about technical
implementation.
This is an extract from analyzer:
*** Table AE_DATA ****************************************************
Percentage of total database.......... 99.89%
Number of entries..................... 1030371
Bytes of storage consumed............. 67846144
Bytes of payload...................... 61186719 90.2%
Average payload per entry............. 59.38
Average unused bytes per entry........ 0.34
Average fanout........................ 752.00
Fragmentation......................... 0.35%
Maximum payload per entry............. 65
Entries that use overflow............. 0 0.0%
Index pages used...................... 11
Primary pages used.................... 8271
Overflow pages used................... 0
Total pages used...................... 8282
Unused bytes on index pages........... 15678 17.4%
Unused bytes on primary pages......... 337429 0.50%
Unused bytes on overflow pages........ 0
Unused bytes on all pages............. 353107 0.52%
So I understand that the 11 index pages are pure btree pages, but the
leaves are actually in the ~8000 data pages.
And it probably needs to visit (i.e. load) all data pages to count the
leaves...
Even if there would be some counter in the header of each page, it still
needs to load the pages which is bad for IO...
BTW I found this by opening some file over network, which of course made
everything worse.
For my case (file format) the data is append (write) only, so max(rowid)
works equally good.
As a note, I actually HAVE the record count stored somewhere else but I
had this query in a generic copy routine which was also used for some
other small tables.
I agree it's some kind of corner case, usually tables have some kind of
indices. But in this case I need high speed, indices would bring
performance down.
Not that I really need, but I have to support specified data rates up to
100k records / second.
And I only access the data sequentially by rowid.
Just for the sake of discussion: I imagine some hacks to the btree to
optimize this special case.
The btree nodes could store the number of leaves just for the data pages
(e.g. 0: unknown, >0 valid number); it would need to propagate up the
info just until it reaches a parent in an index page. And it needs to
update this info only when a node changes from leaf to having a child.
Thanks for all your time,
Gabriel
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users