Re: [sqlite] Looking for advice on improving performance of an algorithm

Mrs. Brisby Fri, 24 Oct 2003 18:24:03 -0700

CREATE TABLE Correlations (origin, destination, weight);
INSERT INTO Correlations VALUES ('p', 'q', 1);
INSERT INTO Correlations VALUES ('q', 'a', 0.5);
INSERT INTO Correlations VALUES ('q', 'b', 0.5);
INSERT INTO Correlations VALUES ('q', 'c', 0.5);


INSERT INTO Correlations
        SELECT a1.origin,a2.destination,a1.weight+a2.weight
        FROM Correlations AS a1, Correlations AS a2
        WHERE a1.destination=a2.origin
        AND a1.origin != a2.origin
        AND a2.destination!=a1.origin
        AND a1.origin||'|'||a2.destination
        NOT IN (SELECT origin||'|'||destination FROM Correlations);

SELECT * FROM Correlations WHERE origin='p';

----

To change the weight of a particular set, you'll really want to do a
series of DELETES, then a new INSERT before the [final] iterative
INSERT.

DRH: Any chance at getting a trigger that can repeat an INSERT operation
as it's action until the INSERT produces no rows?




On Fri, 2003-10-24 at 18:24, Michael Grigoriev wrote:
> Hi,
> 
> I am working on IMMS [http://www.luminal.org/wiki/index.php/IMMS/IMMS], an
> adaptive playlist plugin for XMMS. It uses SQLite as the backend to store
> all the metadata.
> 
> The algorithms IMMS uses for song correlation are in essence graph
> algorithms. The graph currently is stored in the database as an adjacency
> matrix (3 columns: origin, destination, link weight).
> 
> The most resource intensive algorithm has to do the following.
> Given a graph of form
>    
>          A
>         /
>     P--Q--B
>         \
>          C
> 
> And points P and Q, it tries to create links between P and the outer graph
> of Q. So in the example above it would create/update links P--A, P--B, and
> P--C.
> 
> This would be easy to do, except the weight of the newly created links depends
> on the weight of the current link between Q and node in question, as well as
> the weight of the link from the node to P.
> 
> So right now, I do something like (this is a simplified example):
> 
>     (Have to create new table here, otherwise the table will be locked,
>      and I will not be able to run updates from the callback)
> 
>     CREATE TEMP TABLE 'Temp' AS
>         SELECT * FROM 'Correlations' WHERE origin = 'Q' AND destination != 'P';
>     SELECT * FROM 'Temp';
> 
> And then the callback gets called it gets 3 parameters, lets call them
> ORIGIN, DESTINATION and WEIGHT, and I have to do something horrible like:
> 
>     SELECT count(weight) FROM 'Correlations'
>         WHERE origin = ORIGIN AND destination = DESTINATION;
> 
>     and then if it returns 0, I insert the new value
> 
>     INSERT INTO 'Correlations' ('origin', 'destination', 'weight')
>                 "VALUES (ORIGIN, DESTINATION, WEIGHT / MAX_WEIGHT);
> 
>             (where MAX_WEIGHT is just some constant)
>     
>     and if the select query returned something other than 0, I add to the
>         existing link
> 
>     UPDATE 'Correlations' SET weight = weight + WEIGHT / MAX_WEIGHT
>         WHERE origin = ORIGIN AND destination = DESTINATION;
> 
> 
> Needless to say, this is very slow.
> 
> I was wondering if anybody has any ideas on how I can optimize this.
> My knowledge of SQL is not too in depth, so I might be missing something.
> 
> The main problem (I think) is the way the adjacency matrix is stored.
> Because there are two nodes, I cannot really use a UNIQUE constraint to be
> able to simply do something like INSERT OR REPLACE. And even if I say
> concatenated origin and destination (like make "2", "5" into "2|5" or
> something), and enforced uniqueness on that, I don't know how to make a
> single query that will add to the link or insert it if it does not exist....
> 
> So I do not see a way to compress this to a single update statement (which
> would be significantly more efficient because there would be no need call
> the callback, or even create the temp table), even if I was willing to
> sacrifice some functionality to do that. For example I could maybe live
> without using the weight of the links between Q and its external nodes, and
> just always update links between P and the nodes by a constant value....
> But because of the problem above, I don't see a good way to do even that.
> 
> Has anyone done any graph-algorithms-in-an-sql database type stuff? Any
> ideas for a better way to store the graph in SQLite to make the updates
> faster and/or the single query update possible?
> 
> Any suggestions would be greatly appreciated. Thanks in advance.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sqlite] Looking for advice on improving performance of an algorithm

Reply via email to