Re: [sqlite] Efficient ways to maintaining order rank / row_number() in a rather large set ?

Wade, William Tue, 09 Jan 2018 05:40:31 -0800

If you were going to do this entirely in memory (perhaps in C, or some similar 
language), you would likely use some tree structure where each node keeps track 
of the number of descendants (direct and indirect) of that node. That allows 
the operations you describe to occur in O(log(N)) time. Single-record 
insert/delete/update has the same time complexity.


It is likely that for your tree, every node would have the same structure 
(struct in C), or else every internal node would have one structure, and every 
leaf node would have another structure.

Now given a bunch of objects with the same structure, you can easily store them 
in a relational database, rather than in memory, and perform similar operations 
on them. A collection of struct instances turns into a table, and a C pointer 
turns into a row-id (or similar). This isn't entirely free, of course. In C we 
think of a pointer dereference as occurring in constant time, while in a 
database, a key lookup is typically log(N) time, but still, your log(N) 
in-memory solution becomes a log-squared(N) database solution, and that is 
usually fast enough.

Of course you lose some of the database "convenience. You're essentially 
implementing trees which are close to those that already "free" in sqlite. 
Likewise, some simple SQL queries turn into something more complex (since you 
need to maintain your tree). At least you still get the ACID benefits.

If I google "counting tree in sqlite" I see some hits that, perhaps, already do 
this kind of thing.

Regards,
Bill

-----Original Message-----
From: Eric Grange [mailto:zar...@gmail.com]
Sent: Tuesday, January 9, 2018 3:51
To: General Discussion of SQLite Database <sqlite-users@mailinglists.sqlite.org>
Subject: [sqlite] Efficient ways to maintaining order rank / row_number() in a 
rather large set ?

Hi,

I have a problem where I have a large set of (key, value) which I want to sort 
by value, and then store in a table having (rank, key, value) fields, so that 
for a given key I can quickly find the rank, or for a given rank range, I can 
quickly list the keys & values.

Since there is no ROW_NUMBER() function, but there is an autoincrement feature, 
and the rank are numbered 1, 2, 3 etc. the strategy I have been using is to 
create ranked table like

   CREATE RANKED (
      RANK INTEGER PRIMARY KEY AUTOINCREMENT,
      KEY INTEGER,
      VALUE FLOAT
   )

(+ an index for the key)

and then I fill that table with something like

   INSERT INTO RANKED
      SELECT key, value
      FROM ...something rather complex and big...
      ORDER BY value desc

This works well enough, but as the amount of values to be ranked increases, 
this feels wasteful to delete everything and then re-insert everything just to 
adjust the RANK column, also I am running into memory issues as the ORDER BY 
requires a temporary b-tree which runs into the gigabyte range in some 
instances.

I have ways to maintain the KEY and VALUES individually and incrementally, but 
approaches I have tried to maintain the RANK with UPDATE queries ran much 
slower than deleting and recreating everything, though this could just be bad 
implementations from my part.

Are there any other strategies I could use that could update just the RANK 
field and mitigate the temporary B-tree size?

Eric


**************************************************************************************
This e-mail and any attachments thereto may contain confidential information 
and/or information protected by intellectual property rights for the exclusive 
attention of the intended addressees named above. If you have received this 
transmission in error, please immediately notify the sender by return e-mail 
and delete this message and its attachments. Unauthorized use, copying or 
further full or partial distribution of this e-mail or its contents is 
prohibited.
**************************************************************************************
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Efficient ways to maintaining order rank / row_number() in a rather large set ?

Reply via email to