Am 29.10.2013 13:19, schrieb Dominique Devienne:
So, after those functions are in place - where's the problem with:
select * from table where InMySmallUnsortedArrayExists(some_column)
select * from table where InMyLargerSortedArrayExists(some_column)
select * from table where InMyHashListExists(some_column)
...
etc. for trees or whatever you want to use to speed-up the exists-check.
Or more generically with an additional Param:
select * from table where InList(@ListTypeEnmValue, some_column)
First off, when you use functions like this, you basically rule
out index use, even if some_column is indexed. That's not good.
<< WHERE col IN list>> OTOH, might use an index.
Might - yes, and *if* an index is used for the In-Checks, then
you're perhaps "wasting it" - or it could be the wrong index
which is choosen by the query-optimizer.
The better index (in case you use Tmp-Tables) is not the index
on col of the "real table", but the index on the Tmp-Table-Col.
However, in any case we have an exists-check to perform here,
(for every sqlite3_step) and the only question is, can your own
function perform this check faster than the built-in mechanism
of SQLite (when SQLite performs at its best, using an index which
was defined on the comparelist).
In my tests SQLite cannot outperform a well-implemented
"sorting Dictionary-Class", no matter if an index is in use
or not.
So, I would not dismiss the usage of UDFs in your special case
that fast - the UDF-implementation is dead-easy, more flexible
and with the Dictionary I was using, about 50% faster than
SQLite with pre-indexed Tmp-Tables (about factor 3.5 faster
than what you're currently using with the dynamic List-Joins).
Second, as I mentioned, it's the UI that's SQL-driven. You can have N lists
or tables or combos in various dialogs, all looking at the same underlying
(virtual) table but each will have its own selection, so it's N different
lists that need to be used, where N is not fixed (some dialogs can be
popped up several times too, different instances of the *same* dialog). So
creating a function or a temp table for each just isn't great either, and
forces to name what is inherently "anonymous" IMHO.
For exactly this "anonymous case" I've already proposed:
select * from table where InList(@ListTypeEnmValue, some_column)
Meaning, that you only need this single function instead of the
"more specialized ones" - at the "cost" of setting one additional
parameter - and reacting to that param within your UDF-callback.
You can precompile such kind of statement and use binding-calls,
to replace the @ListTypeEnmValue Param-Slot with an Integer-
value (or even a Pointer-Value) of your choice.
Below is my complete Testcode (sorry, no C-code - but I think you
will get the idea - and will deduce that the CommandObjects are
simply encapsulating the SQLite-Binding-Calls).
Here's the timing-values this Demo puts out (100000 values, split
into two lists: [1 to 40000] and the other one [40001 to 100000]
UDF and Dictionary 103msec 40000 60000
Tmp-Tbls without Index 301msec 40000 60000
Tmp-Tbls with Index 151msec 40000 60000
Joined comma-sep-Lists 358msec 40000 60000
VB6-code:
Option Explicit
Implements IFunction
Enum enmDictType
DictTypeNone
Dict40K
Dict60K
'...
DictTypeMax
End Enum
Private Cnn As cConnection, DictArr(DictTypeMax) As cSortedDictionary
Private Sub Form_Click()
Dim i As Long, Res1&, Res2&, Arr1$(1 To 40000), Arr2$(40001 To 100000)
Cls
Set Cnn = New_c.Connection(, DBCreateInMemory) 'create a new InMem-DB
Cnn.AddUserDefinedFunction Me 'add the Implementer of the new func
Cnn.Execute "Create Table T(Col Integer)"
With Cnn.CreateCommand("Insert Into T Values(?)")
Cnn.BeginTrans
For i = 1 To 100000 'add values into T (fast, per Binding-API)
.SetInt32 1, i: .Execute
Next
Cnn.CommitTrans
End With
'two Temp-Tables (one for 40K, the other for 60K records)
Cnn.Execute "Create Temp Table Tmp1(Col Integer)"
Cnn.Execute "Insert Into Tmp1 Select * From T Where Col<=40000"
Cnn.Execute "Create Temp Table Tmp2(Col Integer)"
Cnn.Execute "Insert Into Tmp2 Select * From T Where Col >40000"
'same preparations for our alternatives to the Tmp-Tables
Set DictArr(Dict40K) = New cSortedDictionary
Set DictArr(Dict60K) = New cSortedDictionary
For i = 1 To 40000
DictArr(Dict40K).Add i
Arr1(i) = i
Next
For i = 40001 To 100000
DictArr(Dict60K).Add i
Arr2(i) = i
Next
'use a static precompiled query with a UDF (just different Params)
New_c.Timing True
Res1 = DoQuery(Dict40K)
Res2 = DoQuery(Dict60K)
Print "UDF and Dictionary", New_c.Timing, Res1, Res2
New_c.Timing True
Res1 = GetCount("Select Count(*) from T Where Col in Tmp1")
Res2 = GetCount("Select Count(*) from T Where Col in Tmp2")
Print "Tmp-Tbls without Index", New_c.Timing, Res1, Res2
Cnn.Execute "Create Index idx_Tmp1_Col On Tmp1(Col)"
Cnn.Execute "Create Index idx_Tmp2_Col On Tmp2(Col)"
New_c.Timing True
Res1 = GetCount("Select Count(*) from T Where Col in Tmp1")
Res2 = GetCount("Select Count(*) from T Where Col in Tmp2")
Print "Tmp-Tbls with Index", New_c.Timing, Res1, Res2
Dim CSL1 As String: CSL1 = "(" & Join$(Arr1, ",") & ")"
Dim CSL2 As String: CSL2 = "(" & Join$(Arr2, ",") & ")"
New_c.Timing True
Res1 = GetCount("Select Count(*) from T Where Col in " & CSL1)
Res2 = GetCount("Select Count(*) from T Where Col in " & CSL2)
Print "Joined comma-sep-Lists", New_c.Timing, Res1, Res2
End Sub
Private Function GetCount(SQL As String) As Long
GetCount = Cnn.OpenRecordset(SQL)(0)
End Function
Private Function DoQuery(DictType As enmDictType) As Long
Static Query As cSelectCommand
If Query Is Nothing Then Set Query = Cnn.CreateSelectCommand( _
"Select Count(*) from T Where InDict(@DictType, Col)")
With Query
.SetInt32 !DictType, DictType
DoQuery = .Execute()(0).Value
End With
End Function
'Implementation of the IFunction-Interface, to support InDict(P1, P2)
Property Get IFunction_DefinedNames() As String
IFunction_DefinedNames = "InDict"
End Property
Sub IFunction_Callback(ByVal ZeroBasedNameIndex As Long, _
ByVal ParamCount As Long, UDF As cUDFMethods)
Dim DictType As enmDictType: DictType = UDF.GetInt32(1)
Dim TheValue As Long: TheValue = UDF.GetInt32(2)
UDF.SetResultInt32 DictArr(DictType).Exists(TheValue)
End Sub
Olaf
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users