On 29 Apr 2015, at 3:22pm, Jason Vas Dias <jason.vas.dias at gmail.com> wrote:
> My question is simply how to get sqlite3 to use multiple threads to create the > temporary table from a query If you are working on a database on disk (rather than one in memory), and most of your time is spend storing data in the table (rather than calculating the values to be stored) then most of the time spent is spent on waiting for your storage medium and threading is a red herring. > Currently, I have not had enough time to > wait for SQLite to create the temporary table ( I tried for over 2 > hours) How many rows were you trying to store ? You mentioned 2 million. If it's taking SQLite 2 hours to store 2 million rows you're doing something wrong. Threads are the wrong way to fix this problem. Any attempt to split this up into threads will just have all the threads blocking each other as they wait for access to the database. > My question is simply how to get sqlite3 to use multiple threads to create the > temporary table from a query, which is computing minima, averages and maxima > over about 2 million floating point measurements that correspond to a > set of "GROUP BY" selectors, and then back-tracking to count how many > measurements M with the same > selectors fall into one of 4 "quartiles" : > 1: min(M) to Q1: ( min(M) + ( (max(M) - min(M) ) /4 ) > 2: Q1 to Q2: ( min(M) + ( (max(M) - min(M) ) /2 ) ) > 3: Q2 to Q3: ( min(M) + ( (max(M) - min(M) ) * 1.75) ) > 4: Q3 to max(M) > (the goal is eventually to plot these measurements to show the rough > distribution > of measurements between the maximum and minimum along with the average on > the same plot). I suspect you're making a common mistake, and trying to devise one (maybe two) SQL commands which does all of this. SQL is meant for storage and retrieval, not interval maths. Here is a suggested way of proceeding. A procedure which stores your 'M' values and selectors in a table which has an extra column called 'quartile' you don't use yet, and an index on (selector,M). A procedure which reads a list of groups with minima and maxima into an array of triple elements (something like "SELECT min(M),max(M) FROM myTable GROUP BY myCategory"). A procedure which iterates through that array and does calculations to work out where the quartile boundaries are. Then uses an UPDATE command to store the appropriate value in the quartile column (something like "UPDATE myTable SET quartile=CASE WHEN m<q1 THEN "1" WHEN m<q2 THEN "2" WHEN ... END WHERE myCategory=%thisCat"). This does the things SQLite is good at in SQLite, and the things programming languages are good at in your programming language. It contains three different kinds of SQLite command, each of which will execute very quickly using an index. It produces results for each stage which can be easily checked for accuracy while you're debugging. And it produces a TABLE which clearly lays out your counts for later use. Simon.