Hi All, I need help with speeding up my samples collection which is
processed using C# codes and SQLite. I am doing a random sampling
experiment using sqlite to find the most dominant types of lab. findings.
We have about 26 years of lab experiments data provided in a table such as:
CREATE TABLE [Data] (
[ExperimentNumber] INTEGER NOT NULL,
[ExperimentDate] DATE NOT NULL,
[Type1] INTEGER NOT NULL,
[Type2] INTEGER NOT NULL,
[Type3] INTEGER NOT NULL
)
Actually we have 22 'Type' fields but for simplicity, we assume to have
only 3 'Type' fields. Moreover, the types found in each experiment are
separated as well as combined and sorted in the field of CType in table
Types below.
CREATE TABLE [Types] (
[ExperimentNumber] INTEGER NOT NULL,
[ExperimentDate] DATE NOT NULL,
[CType] TEXT NOT NULL
)
For example, if we have in table Data as:
insert into Data values(12, '1989-02-29', 88, 25, 10);
then we have in table Types precisely as
insert into Types values(12. '1989-02-29', 88);
insert into Types values(12. '1989-02-29', 25);
insert into Types values(12. '1989-02-29', 10);
insert into Types values(12. '1989-02-29', 10|25|);
insert into Types values(12. '1989-02-29', 10|88);
insert into Types values(12. '1989-02-29', 25|88);
insert into Types values(12. '1989-02-29', 10|25|88);
To find the most dominant 'Types', sample experiments are selected randomly
from a set of timeframes which is also defined randomly. Most frequent
'Types' found from the selected samples are then recorded and compared to a
randomly selected experiment number whose experiment date is later than any
experiment date of the selected samples. Matched 'Type' are recorded into a
database table for further analysis. So, I created the following tables:
CREATE TABLE [SampleDetects] (
[ExperimentRanges] TEXT NOT NULL,
[ExperimentNumToMatch] TEXT NOT NULL
)
CREATE TABLE [SampleRanges] (
[Samplerangeid] INTEGER NOT NULL,
[start] INTEGER NOT NULL,
[end] INTEGER NOT NULL
)
CREATE TABLE [TopSample] (
[CType] TEXT NOT NULL,
[frequency] INTEGER NOT NULL,
[Samplerangeid] INTEGER NOT NULL
)
CREATE TABLE [TopSampleMatches] (
[matchedCType] TEXT NOT NULL,
[Samplerangeid] INTEGER NOT NULL,
[Experimentno] INTEGER NOT NULL
)
While the sqlite3.dll wrapper functions are not displayed here, but the
following C# code should be comprehensive :
<CODE>
void Sampling(string database, int samplesize, int MaxTimeframes,
int noofExpToMatch)
{
random = new Random();
Classes.clsSQLite3DLL sqlite3 = new Classes.clsSQLite3DLL();
//sqlite3.dll wrapper
int count = 0; //counter to increment
int minexpno = sqlite3.getMinMaxExpNo(database, "Min");
//Obtain min. experiment no.
int maxexpno = sqlite3.getMinMaxExpNo(database, "Max");
//...and max. experiment no.
int Samplerangeid = sqlite3.getLastSampleRangeID(database);
//get the last samplerangeid in the database
while (count < samplesize)
{
//new cycle initialization
starts---------------------------------------------------------------------------------------
//the whole experiment population is divided into no of
time frames
int nooftimeframes = random.Next(1, MaxTimeframes);
//define the number of timeframes
//each timeframe has an equal interval
//minus 20 is placed to ensure availability of later
experiments to compare with
int interval = Convert.ToInt32((maxexpno - minexpno - 20) /
nooftimeframes);
int[] start = new int[nooftimeframes]; //to store the
lowest experiment no. in each timeframe
int[] end = new int[nooftimeframes]; //to store the highest
experiment no. in each timeframe
int _minexpno = minexpno; //the lowest experiment number
found in the database
int maxnum = 0; //the maximum exepriment number of all the
randomly selected samples
string Expnumbers = string.Empty; //to store the starts
and the ends of experiment numbers of each intervals.
//This string will be inserted into table
SampleDetects to ensure only unique values collected.
int matchcount = 0; //represents the number of experiments
to compare with for incremental purpose
int ExpNo_tomatch = 0; //to store random experiment number
to compare with
string sqlmid = string.Empty;
string sqlhead = string.Empty;
string sqltail = " group by CType order by count(*) desc;";
string sql = string.Empty;
//initialization
ends-----------------------------------------------------------------------------------------------------
//select range of experiment numbers for each timeframe
for (int j = 0; j < nooftimeframes; ++j)
{
start[j] = random.Next(_minexpno, _minexpno + interval);
end[j] = random.Next(start[j] + 1, _minexpno +
interval);
_minexpno = _minexpno + interval + 1;
if (end[j] > maxnum)
maxnum = end[j]; //to store maximum of experiment
no. in the collected ranges
if (j != 0)
{
sqlmid += " or ";
Expnumbers += ", ";
}
sqlmid += "ExperimentNumber between " + start[j] + "
and " + end[j];
Expnumbers = start[j].ToString() + ", " +
end[j].ToString();
}
//to ensure unique value of each sample timeframes
if (!sqlite3.isExpnumbersExist(database, Expnumbers))
{
//insert to table sampleranges
for (int j = 0; j < nooftimeframes; ++j)
{
sql = sql = "insert into SampleRanges values(" +
Samplerangeid + ", " + start[j] + ", " + end[j] + ");";
sqlite3.iExecuteNonQueryInsert(sql, database);
//sqlite3.dll wrapper function
}
++Samplerangeid;
sqlhead = "insert into TopSample select CType,
count(*), " + Samplerangeid + " from Type where ";
//insert to table topsample
sql = sqlhead + sqlmid + sqltail;
sqlite3.iExecuteNonQueryInsert(sql, database);
//further table insertion...
while (matchcount < noofExpToMatch)
{
//get random experiment no.
ExpNo_tomatch = random.Next(maxnum + 1, _maxexpno);
//insert into table sampledetects
sql = "insert into SampleDetects values (" +
Expnumbers + ", " + ExpNo_tomatch.ToString() + ");";
sqlite3.iExecuteNonQueryInsert(sql, database);
//insert into table topsamplematches
sql = "insert into TopSampleMatches select a.CType,
a.Samplerangeid, b.drawnumber from TopSample a join Data b where
a.CType in (b.Type1, b.Type2, b.Type3) and b.ExperimentNumber
= " + ExpNo_tomatch + " and a.Samplerangeid = " + Samplerangeid +
";";
sqlite3.iExecuteNonQueryInsert(sql, database);
++matchcount;
}
++count; //increment count to reach samplesize
}
}
}
</CODE>
If I were to do it this way, it would take me about 30 hours to collect
17000 samples. I believe if I could do it faster using SQLite random number
generator and query instead of C#. Hence, I'd really appreciate any
guidance and help you could give me in transforming the C# code into
SQLite queries to achieve faster random sampling collections. Thank's very
much for your time and any help you could give me.
Cheers, Rick.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users