Am 29.10.2013 13:19, schrieb Dominique Devienne:

So, after those functions are in place - where's the problem with:

select * from table where InMySmallUnsortedArrayExists(some_column)
select * from table where InMyLargerSortedArrayExists(some_column)
select * from table where InMyHashListExists(some_column)
...
etc. for trees or whatever you want to use to speed-up the exists-check.

Or more generically with an additional Param:
select * from table where InList(@ListTypeEnmValue, some_column)


First off, when you use functions like this, you basically rule
out index use, even if some_column is indexed. That's not good.
<< WHERE col IN list>> OTOH, might use an index.

Might - yes, and *if* an index is used for the In-Checks, then
you're perhaps "wasting it" - or it could be the wrong index
which is choosen by the query-optimizer.

The better index (in case you use Tmp-Tables) is not the index
on col of the "real table", but the index on the Tmp-Table-Col.

However, in any case we have an exists-check to perform here,
(for every sqlite3_step) and the only question is, can your own
function perform this check faster than the built-in mechanism
of SQLite (when SQLite performs at its best, using an index which
was defined on the comparelist).

In my tests SQLite cannot outperform a well-implemented
"sorting Dictionary-Class", no matter if an index is in use
or not.

So, I would not dismiss the usage of UDFs in your special case
that fast - the UDF-implementation is dead-easy, more flexible
and with the Dictionary I was using, about 50% faster than
SQLite with pre-indexed Tmp-Tables (about factor 3.5 faster
than what you're currently using with the dynamic List-Joins).


Second, as I mentioned, it's the UI that's SQL-driven. You can have N lists
or tables or combos in various dialogs, all looking at the same underlying
(virtual) table but each will have its own selection, so it's N different
lists that need to be used, where N is not fixed (some dialogs can be
popped up several times too, different instances of the *same* dialog). So
creating a function or a temp table for each just isn't great either, and
forces to name what is inherently "anonymous" IMHO.


For exactly this "anonymous case" I've already proposed:
select * from table where InList(@ListTypeEnmValue, some_column)

Meaning, that you only need this single function instead of the
"more specialized ones" - at the "cost" of setting one additional
parameter - and reacting to that param within your UDF-callback.

You can precompile such kind of statement and use binding-calls,
to replace the @ListTypeEnmValue Param-Slot with an Integer-
value (or even a Pointer-Value) of your choice.

Below is my complete Testcode (sorry, no C-code - but I think you
will get the idea - and will deduce that the CommandObjects are
simply encapsulating the SQLite-Binding-Calls).

Here's the timing-values this Demo puts out (100000 values, split
into two lists: [1 to 40000] and the other one [40001 to 100000]

UDF and Dictionary           103msec     40000     60000
Tmp-Tbls without Index       301msec     40000     60000
Tmp-Tbls with Index          151msec     40000     60000
Joined comma-sep-Lists       358msec     40000     60000

VB6-code:

Option Explicit

Implements IFunction

Enum enmDictType
  DictTypeNone
  Dict40K
  Dict60K
  '...
  DictTypeMax
End Enum

Private Cnn As cConnection, DictArr(DictTypeMax) As cSortedDictionary

Private Sub Form_Click()
Dim i As Long, Res1&, Res2&, Arr1$(1 To 40000), Arr2$(40001 To 100000)
Cls

  Set Cnn = New_c.Connection(, DBCreateInMemory) 'create a new InMem-DB
      Cnn.AddUserDefinedFunction Me 'add the Implementer of the new func
      Cnn.Execute "Create Table T(Col Integer)"

  With Cnn.CreateCommand("Insert Into T Values(?)")
       Cnn.BeginTrans
          For i = 1 To 100000 'add values into T (fast, per Binding-API)
            .SetInt32 1, i: .Execute
          Next
       Cnn.CommitTrans
  End With

  'two Temp-Tables (one for 40K, the other for 60K records)
  Cnn.Execute "Create Temp Table Tmp1(Col Integer)"
  Cnn.Execute "Insert Into Tmp1 Select * From T Where Col<=40000"
  Cnn.Execute "Create Temp Table Tmp2(Col Integer)"
  Cnn.Execute "Insert Into Tmp2 Select * From T Where Col >40000"

  'same preparations for our alternatives to the Tmp-Tables
  Set DictArr(Dict40K) = New cSortedDictionary
  Set DictArr(Dict60K) = New cSortedDictionary
  For i = 1 To 40000
    DictArr(Dict40K).Add i
    Arr1(i) = i
  Next
  For i = 40001 To 100000
    DictArr(Dict60K).Add i
    Arr2(i) = i
  Next

  'use a static precompiled query with a UDF (just different Params)
  New_c.Timing True
    Res1 = DoQuery(Dict40K)
    Res2 = DoQuery(Dict60K)
  Print "UDF and Dictionary", New_c.Timing, Res1, Res2

  New_c.Timing True
    Res1 = GetCount("Select Count(*) from T Where Col in Tmp1")
    Res2 = GetCount("Select Count(*) from T Where Col in Tmp2")
  Print "Tmp-Tbls without Index", New_c.Timing, Res1, Res2

  Cnn.Execute "Create Index idx_Tmp1_Col On Tmp1(Col)"
  Cnn.Execute "Create Index idx_Tmp2_Col On Tmp2(Col)"
  New_c.Timing True
    Res1 = GetCount("Select Count(*) from T Where Col in Tmp1")
    Res2 = GetCount("Select Count(*) from T Where Col in Tmp2")
  Print "Tmp-Tbls with Index", New_c.Timing, Res1, Res2

  Dim CSL1 As String: CSL1 = "(" & Join$(Arr1, ",") & ")"
  Dim CSL2 As String: CSL2 = "(" & Join$(Arr2, ",") & ")"
  New_c.Timing True
    Res1 = GetCount("Select Count(*) from T Where Col in " & CSL1)
    Res2 = GetCount("Select Count(*) from T Where Col in " & CSL2)
  Print "Joined comma-sep-Lists", New_c.Timing, Res1, Res2
End Sub

Private Function GetCount(SQL As String) As Long
  GetCount = Cnn.OpenRecordset(SQL)(0)
End Function

Private Function DoQuery(DictType As enmDictType) As Long
Static Query As cSelectCommand
  If Query Is Nothing Then Set Query = Cnn.CreateSelectCommand( _
             "Select Count(*) from T Where InDict(@DictType, Col)")
  With Query
    .SetInt32 !DictType, DictType
    DoQuery = .Execute()(0).Value
  End With
End Function

'Implementation of the IFunction-Interface, to support InDict(P1, P2)
Property Get IFunction_DefinedNames() As String
  IFunction_DefinedNames = "InDict"
End Property

Sub IFunction_Callback(ByVal ZeroBasedNameIndex As Long, _
                       ByVal ParamCount As Long, UDF As cUDFMethods)

Dim DictType As enmDictType: DictType = UDF.GetInt32(1)
Dim TheValue As Long:        TheValue = UDF.GetInt32(2)

  UDF.SetResultInt32 DictArr(DictType).Exists(TheValue)
End Sub

Olaf

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to