Re: [sqlite] select count(*) does not use primary key??

Gabriel Corneanu Fri, 07 Jun 2013 00:58:35 -0700

I admit I didn't think (or didn't even read in detail) about technicalimplementation.


This is an extract from analyzer:
*** Table AE_DATA ****************************************************


Percentage of total database..........  99.89%
Number of entries..................... 1030371
Bytes of storage consumed............. 67846144
Bytes of payload...................... 61186719    90.2%
Average payload per entry............. 59.38
Average unused bytes per entry........ 0.34
Average fanout........................ 752.00
Fragmentation.........................   0.35%
Maximum payload per entry............. 65
Entries that use overflow............. 0            0.0%
Index pages used...................... 11
Primary pages used.................... 8271
Overflow pages used................... 0
Total pages used...................... 8282
Unused bytes on index pages........... 15678       17.4%
Unused bytes on primary pages......... 337429       0.50%
Unused bytes on overflow pages........ 0
Unused bytes on all pages............. 353107       0.52%

So I understand that the 11 index pages are pure btree pages, but theleaves are actually in the ~8000 data pages.And it probably needs to visit (i.e. load) all data pages to count theleaves...Even if there would be some counter in the header of each page, it stillneeds to load the pages which is bad for IO...

BTW I found this by opening some file over network, which of course madeeverything worse.For my case (file format) the data is append (write) only, so max(rowid)works equally good.As a note, I actually HAVE the record count stored somewhere else but Ihad this query in a generic copy routine which was also used for someother small tables.I agree it's some kind of corner case, usually tables have some kind ofindices. But in this case I need high speed, indices would bringperformance down.Not that I really need, but I have to support specified data rates up to100k records / second.

And I only access the data sequentially by rowid.

Just for the sake of discussion: I imagine some hacks to the btree tooptimize this special case.The btree nodes could store the number of leaves just for the data pages(e.g. 0: unknown, >0 valid number); it would need to propagate up theinfo just until it reaches a parent in an index page. And it needs toupdate this info only when a node changes from leaf to having a child.


Thanks for all your time,
Gabriel
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] select count(*) does not use primary key??

Reply via email to