Hi,

This is my observation for performance when using
sqlite3_create_function() PI.

Database -
1. Contains 7 tables.
2. Only one table is very large i.e. it contains 100k rows or more.
3. Rest of the tables contain 10 to 200 rows.
4. Each table contains between 4 to 8 columns.

When running a query to extract all the information in the database and
populate a simple C++ structure for each row it took around 30 minutes. 
The query involved joining all the 7 tables.

Here is an example analogous to my database -
1. There is a table EmployeeTable that contains the following fields
  EmpID         - INT (PRIMARY KEY)
  Name          - VARCHAR(1024)
  Address       - VARCHAR(1024)
  DepartmentID  - INT
  DesignationID - INT

Lets assume that there are 100k employees, and each employee belongs to
a single Department and has a single Designation ID.

2. There is a DepartmentTable that contains the following fields
  DepartmentID - INT (PRIMARY KEY)
  Name         - VARCHAR(1024)
  DeptHead     - VARCHAR(1024)
  Location     - VARCHAR(1024)

Assume that there are 10 departments only, with each department
containing 10k employees.

3. There is a DesignationTable that contains the following fields
  DesignationID - INT (PRIMARY KEY)
  Designation   - VARCHAR(1024)
  MinSalary     - INT
  MaxSalary     - INT
  Description   - VARCHAR(1024)

Assume that there are 10 different types of designations and there are
10k employees at each designation.

4. Lastly there is a EmpDependentsTable which contains each employees
dependents and has these fields
  EmployeeID    - INT
  DependentName - VARCHAR(1024)

Assuming that some employees will have zero dependents and that some
will have dependents ranging from 1 to N, lets say this table has around
40k rows.

Lets have a simple C++ structure like this to hold an employee
information

struct EmployeeInfo
{
  int ID;
  char * name;
  char * address;
  int departmentId;
  int designationId;
  char * departmentName;
  char * departmentHead;
  char * departmentLocation;
  char * designation;
  int minSalary;
  int maxSalary;
  char * description;
  list<char*> dependents;
};

Suppose you run a query to extract information and populate the above
structure for each employee the query would look like this -

---------------------------------------------------------------
select BaseTable.EmpID, BaseTable.FullName, BaseTable.Address,
BaseTable.DepartmentID, BaseTable.DesignationID, BaseTable.Name,
BaseTable.Location, BaseTable.DeptHead, BaseTable.Designation,
BaseTable.MaxSalary, BaseTable.MinSalary, BaseTable.Description, 

createDependentList(EmployeeTable.EmpID, EmpDepnedentsTable.EmpID,
EmpDependentsTable.DependentName) 

from EmployeeTable, DepartmentTable, DesignationTable,
EmpDependentsTable

where EmployeeTable.DepartmentID = DepartmentTable.DepartmentID
and EmployeeTable.DesignationID = DesignationTable.DesignationId
and EmployeeTable.EmpID = EmployeeDependentsTable.EmpID

as BaseTable;
---------------------------------------------------------------

createDependentList - is defined using the sqlite3_create_function() PI.

This query might not be semantically correct but its just an example to
give you an idea of what i'm pointing to.

This query would take really long to execute - 30 mins.


Since SQLite doesn't support "Linking Table Columns" we could simulate
it -
1. by keeping an ID field in the EmpDependentsTable
2. making sure that for each employee when her information is entered in
the database her dependents are entered consecutively

Suppose for employee "Martina Hingis" we enter her dependents - "A",
"B", and "C" one after the other in the EmpDependentsTable, the table
would appear like this -

EmployeeID  DependentID  DependentName
2           1            A
2           2            B
2           3            C
...

Now in the EmployeeTable we enter Martina's information as - after
adding another column DependentIDs of VARCHAR type in the EmployeeTable

EmpID  Name     Address DepartmentID DesignationID DependentIDs
2      Martina  xxxx    77           88            1,3
...

Now the above query can be re-written as -
---------------------------------------------------------------
select BaseTable.EmpID, BaseTable.FullName, BaseTable.Address,
BaseTable.DepartmentID, BaseTable.DesignationID, BaseTable.Name,
BaseTable.Location, BaseTable.DeptHead, BaseTable.Designation,
BaseTable.MaxSalary, BaseTable.MinSalary, BaseTable.Description,

BaseTable.DependentIDs 

from EmployeeTable, DepartmentTable, DesignationTable,
EmpDependentsTable

where EmployeeTable.DepartmentID = DepartmentTable.DepartmentID
and EmployeeTable.DesignationID = DesignationTable.DesignationId
and EmployeeTable.EmpID = EmployeeDependentsTable.EmpID

as BaseTable;
---------------------------------------------------------------

When the callback function for the above query is invoked and as soon as
the BaseTable.DependentIDs column is passed to it we can run another
query to to extract the information from EmployeeDependentsTable for
this employee. The value passed here would "1,3" - we know the value
before the comma is the start id and the value after it is the end id.


After changing my query to be similar to this i found that the time
taken reduced by a factor of 10.


Although this way does assume that you would enter all dependents once
for each employee and that further additions in EmployeeDependentsTable
for an already existing employee arent allowed - otherwise the query
won't work since the DependentIDs are expected to be continous.

Also you need to keep track of the DependentIDs i.e. you would need to
keep a counter in your code.

This worked for me since my case is similar.  If anyone feels that this
approach has any flaws then do criticise.

 
-- 
Regards,
Ritesh Kapoor

"I can do it right, fast or cheap. Pick any two."


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to