I meant of course the following -- "result" is really ef in the where clause ...

with t(rowid, ef)
  as (
      select distinct rowid,
                      expensive_function(?99, vdata)
        from some_table
     ),
     u(rowid, vdist, oid, flags)
  as (
      select some_table.rowid,
             ef as vdist,
             ?1 as oid,
             flags | (case when ef < ?2 then ?3 else 0 end)
        from some_table
        join t 
          on t.rowid == some_table.rowid
       where (flags & ?3) == 0
         and (oid is null and ef < 0.6)
          or ef < vdist
     )
update some_table
   set (vdist, oid, flags) = (select vdist, oid, flags
                                from u
                               where u.rowid = some_table.rowid)
  where some_table.rowid in (select rowid from u)


example, which materializes u, but requires more than one SQL statement, but 
still only requires you provide parameters to one statement:

create temporary table u(rowid integer primary key, vdist, oid, flags);
insert into temp.u
with t(rowid, ef)
  as (
      select distinct rowid,
                      expensive_function(?99, vdata)
        from some_table
     ),
     u(rowid, vdist, oid, flags)
  as (
      select t.rowid,
             ef as vdist,
             ?1 as oid,
             flags | (case when ef < ?2 then ?3 else 0 end)
        from some_table
        join t 
          on t.rowid == some_table.rowid
       where (flags & ?3) == 0
         and (oid is null and ef < 0.6)
          or ef < vdist
     )
select rowid, vdist, oid, flags
  from u;
update some_table
   set (vdist, oid, flags) = (select vdist, oid, flags
                                from temp.u
                               where u.rowid = some_table.rowid)
  where some_table.rowid in (select rowid from temp.u)
drop table temp.u;

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-----Original Message-----
>From: Keith Medcalf <kmedc...@dessus.com>
>Sent: Friday, 8 November, 2019 13:21
>To: 'SQLite mailing list' <sqlite-users@mailinglists.sqlite.org>
>Subject: RE: [sqlite] SQLITE_DETERMINISTIC and custom function
>optimization
>
>
>SQLITE_DETERMINISTIC does not mean that the function is only called once
>for each unique set of arguments, only that when called with a unique set
>of arguments that it returns the same result.  This means that if it is a
>constant it can be factored out of being called more than once.  In your
>case, the parameters are not constant so it must be called once per row.
>
>The optimizer is prone to calculating things more often than it needs to,
>and is difficult to force to "materialize" things.  Since your expensive
>function needs to be calculated for every row of the table anyway, it
>would be better to just create a table that has it calculated once, then
>compute the updates table, then perform the update, then get rid of the
>extra tables.  You could do this in one statement if anyone could figure
>out how to get the optimizer to "materialize" table u in the following
>query (table t is materialized by the distinct, which does nothing other
>than force t to be materialized -- u is not so easy).
>
>with t(rowid, ef)
>  as (
>      select distinct rowid,
>                      expensive_function(?99, vdata)
>        from some_table
>     ),
>     u(rowid, vdist, oid, flags)
>  as (
>      select some_table.rowid,
>             ef as vdist,
>             ?1 as oid,
>             flags | (case when ef < ?2 then ?3 else 0 end)
>        from some_table
>        join t
>          on t.rowid == some_table.rowid
>       where (flags & ?3) == 0
>         and (oid is null and ef < 0.6)
>          or result < vdist
>     )
>update some_table
>   set (vdist, oid, flags) = (select vdist, oid, flags
>                                from u
>                               where u.rowid = some_table.rowid)
>  where some_table.rowid in (select rowid from u)
>
>
>example, which materializes u, but requires more than one SQL statement,
>but still only requires you provide parameters to one statement:
>
>create temporary table u(rowid integer primary key, vdist, oid, flags);
>insert into temp.u
>with t(rowid, ef)
>  as (
>      select distinct rowid,
>                      expensive_function(?99, vdata)
>        from some_table
>     ),
>     u(rowid, vdist, oid, flags)
>  as (
>      select t.rowid,
>             ef as vdist,
>             ?1 as oid,
>             flags | (case when ef < ?2 then ?3 else 0 end)
>        from some_table
>        join t
>          on t.rowid == some_table.rowid
>       where (flags & ?3) == 0
>         and (oid is null and ef < 0.6)
>          or result < vdist
>     )
>select rowid, vdist, oid, flags
>  from u;
>update some_table
>   set (vdist, oid, flags) = (select vdist, oid, flags
>                                from temp.u
>                               where u.rowid = some_table.rowid)
>  where some_table.rowid in (select rowid from temp.u)
>drop table temp.u;
>
>This will calculate expensive_function only once per row in some_table.
>It assumes and requires that "some_table" is a rowid table.
>
>--
>The fact that there's a Highway to Hell but only a Stairway to Heaven
>says a lot about anticipated traffic volume.
>
>>-----Original Message-----
>>From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org> On
>>Behalf Of Mario M. Westphal
>>Sent: Friday, 8 November, 2019 12:08
>>To: sqlite-users@mailinglists.sqlite.org
>>Subject: [sqlite] SQLITE_DETERMINISTIC and custom function optimization
>>
>>Hi all,
>>
>>
>>
>>I have a table with matrices stored as blobs. Each matrix has about 800
>>bytes.
>>
>>This table has between 20,000 and 500,000 rows.
>>
>>
>>
>>I use a custom function "EXPENSIVE_FUNCTION" which performs a
>calculation
>>using a matrix supplied via sqlite3_bind_pointer() as ?99 and the matrix
>>in
>>the vdata column.
>>
>>I create the function using the SQLITE_DETERMINISTIC flag. My hope was
>>that
>>the EXPENSIVE_FUNCTION is called only once per row. But that's not the
>>case.
>>
>>
>>
>>The query looks like this:
>>
>>
>>
>>UPDATE some_table
>>
>>
>>
>>SET
>>
>>vdist = EXPENSIVE_FUNCTION(?99,vdata),
>>
>>oid = ?1,
>>
>>flags = flags | (CASE WHEN EXPENSIVE_FUNCTION(?99,vdata) < ?2 THEN ?3
>>ELSE 0
>>END)
>>
>>
>>
>>WHERE
>>
>>(flags & ?3) = 0  AND
>>
>>(oid IS NULL AND EXPENSIVE_FUNCTION(?99,vdata) < 0.6) OR
>>
>>(EXPENSIVE_FUNCTION(?99,vdata) < vdist)
>>
>>
>>
>>The EXPENSIVE_FUNCTION function is referred multiple times in the update
>>statement. But it always returns the same result (for any given row).
>>
>>
>>
>>My stats report that SQLite calls EXPENSIVE_FUNCTION 91,806 times for a
>>table with 45,775 rows.
>>
>>256 rows are modified. This takes (only) 3.3 seconds.
>>
>>
>>
>>The profiler tells me that sqlite3VdbeExec() spends 47% in
>>vdbeMemFromBtreeResize and 36% in EXPENSIVE_FUNCTION.
>>
>>
>>
>>Can I change something so SQLite calls EXPENSIVE_FUNCTION only once per
>>row?
>>
>>
>>
>>Thanks in advance.
>>
>>
>>
>>-- Mario
>>
>>_______________________________________________
>>sqlite-users mailing list
>>sqlite-users@mailinglists.sqlite.org
>>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to