On 2015-04-08 11:35 AM, Dominique Devienne wrote:
> On Wed, Apr 8, 2015 at 11:16 AM, Jeff Roux <jfroux06 at gmail.com> wrote:
>
>> # For mysql, I use:
>> /etc/init.d/mysql stop; /etc/init.d/mysql start; \
>> time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \
>>       as item FROM flows WHERE timestamp>=1383770600 AND \
>>       timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \
>>       ORDER BY vol DESC LIMIT 6;' |
>>       mysql testperf
>>
> If you dropped the LIMIT 6 from both, are the results significantly
> different?
>
> I.e. does actually getting the full result set result in closer numbers
> between SQLite and MySQL?
>
> I'm not sure SQLite's implementation of LIMIT prevents the whole result-set
> from being sorted before being truncated to 6 rows. Avoiding LIMIT would be
> one round-about way to find out. --DD

No Dominique, it's not that - there must be another thing wrong with his 
setup. (He might not see my replies because he uses gmail).

We will need an actual DB file from the OP with the 1 mil records to 
test and understand the problem because there is something that isn't 
clear or not being said (for deemed unimportant probably).

Again, here is a script that makes a similar table, populate it with 1 
million rows, adds the correct index and then do the aggregate query on 
all of them. It takes less than 500 milliseconds - faster than MySQL 
sans caching. (Obviously my pragmas/compile options may make a 
difference). Even if there are much more data in the table per row, it 
can't cause an order of magnitude increase in time - and it doesn't seem 
to be the case from the schema the OP posted.

In fact, the entire script, making the table, adding the index, 
populating it with a million rows (with computed values no less) and 
then doing the query plus posting the output - ALL of it takes less than 
4 seconds together:
(Try it yourself)

Important to note here is that the query I adapted to actually include 
the entire dataset (all 1 mil rows) so the Index is of little value. It 
only matters where the WHERE clause refers less rows - in which case the 
time decreases linearly.

   -- 
================================================================================================

DROP TABLE IF EXISTS flows;
CREATE TABLE flows(
   id INTEGER PRIMARY KEY,
   ipLan TEXT,
   ipWan TEXT,
   portLan INT,
   portWan INT,
   protocol INT,
   nbBytesDecompOut INT,
   nbBytesCompIn INT,
   tos INT,
   timestamp INT
);

WITH acc(x,mx8,dx8,mxK,dxK,rK) AS (
     SELECT 0,0,0,0,0,100
   UNION ALL
     SELECT x+1, (x%8), CAST(round(x/8,0) AS INT), (x%1024), 
CAST(round(x/1024,0) AS INT),
            CAST(abs(round(random()/1024000000000,0)) AS INT)
       FROM acc
      WHERE (x<1000000)  -- Testing 1 million rows
)
INSERT INTO flows SELECT
   x, '192.168.1.'||mx8, '55.66.77.'||(dx8%256),  -- ipLan, ipWan
   1000+mx8, 5000+mx8, (x%18),                    -- portlan, portWan, 
protocol
   64+(rk%1024000),                               -- nbBytesDecompOut
   1024+(rk%1024000),                             -- nbBytesDecompIn
   (dx8%3), (1380000000+x)                        -- tos, timestamp
   FROM acc;

CREATE INDEX idxFlowTimeProt ON flows (timestamp, protocol);
SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item, 
count(*) as 'No.'
   FROM  flows
  WHERE timestamp>=1373770600 AND timestamp<=1484770600 AND protocol IN 
(17, 6)
  GROUP BY portLan ORDER BY vol DESC
  LIMIT 6;

   -- vol            item
   -- ---            ----
   -- 14020551134    1001
   -- 13964537508    1000
   -- 13940128330    1003
   -- 13938134232    1002
   -- 13930861704    1005
   -- 13913803752    1006

   --    Item Stats:  Item No:           5             Query Size 
(Chars):  232
   --                 Result Columns:    3 Result Rows:         6
   --                 VM Work Steps:     8833543       Rows 
Modified:       0
   --                 Sort Operations:   2 Table-Scan Steps:    0
   --                 Prepare Time:      -- --- --- --- --.----
   --                 Query Run:         0d 00h 00m and 00.408s
   --                 Full Query Time:   0d 00h 00m and 00.408s
   --                 Query Result:      Success.
   -- 
------------------------------------------------------------------------------------------------

   --   Script Stats: Total Script Execution Time:     0d 00h 00m and 
03.742s
   --                 Total Script Query Time:         0d 00h 00m and 
03.620s
   --                 Total Database Rows Changed: 1000001
   --                 Total Virtual-Machine Steps: 76833701
   --                 Last executed Item Index:        5
   --                 Last Script Error:
   -- 
================================================================================================





Reply via email to