On Sun, 16 Apr 2006, [EMAIL PROTECTED] wrote:

[EMAIL PROTECTED] writes:

[...]
I cannot think of a more efficient way to do this if you
require each output to have equal probability.  If you do
not need each string to have exactly the same probability,
however, you could do this:

   SELECT string FROM strings
   WHERE rowid>=random() % (SELECT max(rowid) FROM strings)
   LIMIT 1;

If the ROWID values (your INTEGER PRIMARY KEY) are always auto-generated,
i.e. if you never set them explicitely and don't care about their actual
value, then the only time you'll have holes in the ROWID sequence is due to
deletes.  You can solve that by adding an ON DELETE trigger that replaces
MAX(ROWID) with old.ROWID.  This keeps the ROWID sequence sequential so a
simple "WHERE ROWID=random() % (SELECT MAX(ROWID) FROM strings)" clause will
have equal probability.

Hello,

that trigger is a nice idea to prevent holes. Especially as I'm mostly doing SELECTs on the table, occasional INSERTs and very few DELETEs it doesn't matter too much if the trigger incurs a little performance hit for DELETEs.

Note that you do NOT want to use AUTOINCREMENT in creating your table in this
case, because you want new rows to have the next available ROWID value, vs the
next never-used ROWID value.

Of course.

[...]
I'm not sure of the efficiency of MAX(ROWID) in the trigger.  If you have lots
of deletes, it's possible that you can get more speed out of the trigger by
selecting the first row in descending order:
[...]

So far I have "only" about 94000 rows in the table and I cannot see any speed difference whatsoever between SELECTing the maximum rowid or SELECTing the first row id in descending order.

cu,
Thomas

Reply via email to