My apologies for this lengthy email. I hope someone can wade through it and provide some insight.
For many database manipulation needs, SQLite (or SQL in general) is a perfectly adequate solution. I'm having a problem that does not work well in SQL land (or at least I can't find a solution that works performantly). I've started writing something of my own, but really don't want to write (esp maintain in the future) code that already exists and is suitably licensed (LGPL or recognized open source license; GPL and similar hard copyleft licenses are not usable). We have a slow unindexed data source (third party proprietary database) that we read once and store in a SQLite database for cache purposes. The user needs to be able to "see" the population of this list sorted by some field. When the data source is "small" it's not a big deal, the user will never perceive a problem. When it is large, it takes a very long time (potentially hours) to enumerate the source data. It is easy enough to read the source data and insert it into a table. It is easy enough to query the list of data ordered by any field after the data has been completely read. The hard part is knowing the correct insertion point for a newly read record. Should it go into the current window of the data? Does it go *above* the current window? Or is it *below*? To attempt to solve this problem, I wrote code to maintain my own sort order in memory, and worry about optimization later. Unfortunately, we've now reached the point where this simply can't scale much past 10,000 records, and we have one customer complaining about his 250,000. The solution I've come up with is called Order Statistic Tree. Maybe you've heard of it, though it was new to me. Really it's a modification that can be applied to many / most tree structures. Instead of just maintaining the sort order of keys in the tree along with whatever invariants the tree requires, you maintain a count of the number of items in each subtree. In a normal B-Tree you have O(log N) time to find any given record by key value, but finding the Nth element from the beginning of the list is an O(N) operation. An Order Statistic B-Tree allows O(log N) for finding by key and for selecting the Nth element from. So my question (finally) is: Does anyone here already have or know of some type of Order Statistic code? I'm pretty sure SQLite doesn't have that, though my understanding could be wrong. I appreciate all responses. Believe me, I really do need to keep track of an order of rows as a table is being populated to keep it in sync with an MVC list view. It doesn't have to be 100% optimal, but it needs to be significantly faster than either of the following (assume appropriate unique indexes to avoid indeterminate order, which I can control): SELECT * FROM SomeTable ORDER BY SomeColumn OFFSET 249999 LIMIT 1 SELECT COUNT(*) FROM SomeTable WHERE SomeColumn = value ORDER BY SomeColumn As I understand it, both of these have to execute the full query and calculate the result set in order to do return a result. I need something that gives me log N access (or better) to elements by key and numeric index. Again, my apologies for the length of this. I can write it myself, but if I don't have to, that would be awesome. -- Scott Robison _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users