On Sat, May 17, 2014 at 9:41 PM, Scott Robison <sc...@casaderobison.com>wrote:

> On May 17, 2014 8:40 PM, "Simon Slavin" <slav...@bigfraud.org> wrote:
> > Just want to check: this is what it sounds like, right ?  Your user has
> a window open where they are scrolling through the table, ordered by
> SomeColumn.  You need to know whether the window needs to change to reflect
> the new row.  Is that right ?
>
> In a nutshell. Further it needs to work in an environment where I control
> the data model but not the view or controller, so when a new record comes
> in, I need to know which row it is per current sort order so the model can
> advertise the new row to the view.
>
> > > SELECT COUNT(*) FROM SomeTable WHERE SomeColumn = value ORDER BY
> SomeColumn
> >
> > The second SELECT doesn't need the ORDER BY.  It will have no effect.
>
> Sorry, typo. I meant <=, fingers and brain out of sync.
>
> > How many columns does a table have ?  Roughly.  If it's not a lot all
> you need to is create one index for each column that might be chosen as
> 'SomeColumn'.  Or have you already done this and the speed your complaining
> about is the speed with the index
>
> Yes, speed for large collections is inadequate.
>
> > One technique I used to use in the old days (long before SQL) was to
> keep a second table which contained just the rows that were shown on the
> display.  So you'd have your full database on backing store and then
> another table with just 20 rows.  To see whether your new row will effect
> the display, you just compare it with the first and last entries in the
> display table, which is very small so it's fast to do things with.
>
> That would by far make the most sense. Maybe I can invert control between
> the view & model. Thanks for the thought.
>
Just in case anyone is interested: I got pulled off *this* project for a
higher priority project for a few days. Those days gave me time to think.
Instead of using the in memory vector for my main data (6 strings, a 64 bit
int, and a few bits for flags) which *could* get very large for some "known
worst case scenario", I now use a temp table as a disk based vector. The
only index for that table is an integer primary key so I can quickly pull
things into ram as needed.

The indexes are in memory because I do need to know the relative insertion
point of each row for each supported sort order which I can't easily get
from SQLite. Those are small enough that I can keep them in ram even for
our "current worst case" scenario (which is about 250,000 messages in a
single folder; some people don't believe in organizing their mailboxes).
The string based sort columns just have enough info to retrieve the string
in question from the database plus the first X bytes of the utf8 string
rep. Only if those first bytes are identical between two strings do I need
to go to disk, which makes the implementation fast enough (which it wasn't
when I had to keep fetching strings for every comparison during an insert).

I still like the idea of the order statistic tree, but getting away from
the problem for a few days helped me come back to it "refreshed" as it were
and come up with a good enough solution that fit better into the current
codebase.

Thanks for the ideas!

-- 
Scott Robison
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to