Hi All, I need help with speeding up my samples collection which is processed using C# codes and SQLite. I am doing a random sampling experiment using sqlite to find the most dominant types of lab. findings. We have about 26 years of lab experiments data provided in a table such as:
CREATE TABLE [Data] ( [ExperimentNumber] INTEGER NOT NULL, [ExperimentDate] DATE NOT NULL, [Type1] INTEGER NOT NULL, [Type2] INTEGER NOT NULL, [Type3] INTEGER NOT NULL ) Actually we have 22 'Type' fields but for simplicity, we assume to have only 3 'Type' fields. Moreover, the types found in each experiment are separated as well as combined and sorted in the field of CType in table Types below. CREATE TABLE [Types] ( [ExperimentNumber] INTEGER NOT NULL, [ExperimentDate] DATE NOT NULL, [CType] TEXT NOT NULL ) For example, if we have in table Data as: insert into Data values(12, '1989-02-29', 88, 25, 10); then we have in table Types precisely as insert into Types values(12. '1989-02-29', 88); insert into Types values(12. '1989-02-29', 25); insert into Types values(12. '1989-02-29', 10); insert into Types values(12. '1989-02-29', 10|25|); insert into Types values(12. '1989-02-29', 10|88); insert into Types values(12. '1989-02-29', 25|88); insert into Types values(12. '1989-02-29', 10|25|88); To find the most dominant 'Types', sample experiments are selected randomly from a set of timeframes which is also defined randomly. Most frequent 'Types' found from the selected samples are then recorded and compared to a randomly selected experiment number whose experiment date is later than any experiment date of the selected samples. Matched 'Type' are recorded into a database table for further analysis. So, I created the following tables: CREATE TABLE [SampleDetects] ( [ExperimentRanges] TEXT NOT NULL, [ExperimentNumToMatch] TEXT NOT NULL ) CREATE TABLE [SampleRanges] ( [Samplerangeid] INTEGER NOT NULL, [start] INTEGER NOT NULL, [end] INTEGER NOT NULL ) CREATE TABLE [TopSample] ( [CType] TEXT NOT NULL, [frequency] INTEGER NOT NULL, [Samplerangeid] INTEGER NOT NULL ) CREATE TABLE [TopSampleMatches] ( [matchedCType] TEXT NOT NULL, [Samplerangeid] INTEGER NOT NULL, [Experimentno] INTEGER NOT NULL ) While the sqlite3.dll wrapper functions are not displayed here, but the following C# code should be comprehensive : <CODE> void Sampling(string database, int samplesize, int MaxTimeframes, int noofExpToMatch) { random = new Random(); Classes.clsSQLite3DLL sqlite3 = new Classes.clsSQLite3DLL(); //sqlite3.dll wrapper int count = 0; //counter to increment int minexpno = sqlite3.getMinMaxExpNo(database, "Min"); //Obtain min. experiment no. int maxexpno = sqlite3.getMinMaxExpNo(database, "Max"); //...and max. experiment no. int Samplerangeid = sqlite3.getLastSampleRangeID(database); //get the last samplerangeid in the database while (count < samplesize) { //new cycle initialization starts--------------------------------------------------------------------------------------- //the whole experiment population is divided into no of time frames int nooftimeframes = random.Next(1, MaxTimeframes); //define the number of timeframes //each timeframe has an equal interval //minus 20 is placed to ensure availability of later experiments to compare with int interval = Convert.ToInt32((maxexpno - minexpno - 20) / nooftimeframes); int[] start = new int[nooftimeframes]; //to store the lowest experiment no. in each timeframe int[] end = new int[nooftimeframes]; //to store the highest experiment no. in each timeframe int _minexpno = minexpno; //the lowest experiment number found in the database int maxnum = 0; //the maximum exepriment number of all the randomly selected samples string Expnumbers = string.Empty; //to store the starts and the ends of experiment numbers of each intervals. //This string will be inserted into table SampleDetects to ensure only unique values collected. int matchcount = 0; //represents the number of experiments to compare with for incremental purpose int ExpNo_tomatch = 0; //to store random experiment number to compare with string sqlmid = string.Empty; string sqlhead = string.Empty; string sqltail = " group by CType order by count(*) desc;"; string sql = string.Empty; //initialization ends----------------------------------------------------------------------------------------------------- //select range of experiment numbers for each timeframe for (int j = 0; j < nooftimeframes; ++j) { start[j] = random.Next(_minexpno, _minexpno + interval); end[j] = random.Next(start[j] + 1, _minexpno + interval); _minexpno = _minexpno + interval + 1; if (end[j] > maxnum) maxnum = end[j]; //to store maximum of experiment no. in the collected ranges if (j != 0) { sqlmid += " or "; Expnumbers += ", "; } sqlmid += "ExperimentNumber between " + start[j] + " and " + end[j]; Expnumbers = start[j].ToString() + ", " + end[j].ToString(); } //to ensure unique value of each sample timeframes if (!sqlite3.isExpnumbersExist(database, Expnumbers)) { //insert to table sampleranges for (int j = 0; j < nooftimeframes; ++j) { sql = sql = "insert into SampleRanges values(" + Samplerangeid + ", " + start[j] + ", " + end[j] + ");"; sqlite3.iExecuteNonQueryInsert(sql, database); //sqlite3.dll wrapper function } ++Samplerangeid; sqlhead = "insert into TopSample select CType, count(*), " + Samplerangeid + " from Type where "; //insert to table topsample sql = sqlhead + sqlmid + sqltail; sqlite3.iExecuteNonQueryInsert(sql, database); //further table insertion... while (matchcount < noofExpToMatch) { //get random experiment no. ExpNo_tomatch = random.Next(maxnum + 1, _maxexpno); //insert into table sampledetects sql = "insert into SampleDetects values (" + Expnumbers + ", " + ExpNo_tomatch.ToString() + ");"; sqlite3.iExecuteNonQueryInsert(sql, database); //insert into table topsamplematches sql = "insert into TopSampleMatches select a.CType, a.Samplerangeid, b.drawnumber from TopSample a join Data b where a.CType in (b.Type1, b.Type2, b.Type3) and b.ExperimentNumber = " + ExpNo_tomatch + " and a.Samplerangeid = " + Samplerangeid + ";"; sqlite3.iExecuteNonQueryInsert(sql, database); ++matchcount; } ++count; //increment count to reach samplesize } } } </CODE> If I were to do it this way, it would take me about 30 hours to collect 17000 samples. I believe if I could do it faster using SQLite random number generator and query instead of C#. Hence, I'd really appreciate any guidance and help you could give me in transforming the C# code into SQLite queries to achieve faster random sampling collections. Thank's very much for your time and any help you could give me. Cheers, Rick. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users