[sqlite] Process duplicate field values

R Smith Fri, 19 Feb 2016 11:41:11 +0200


On 2016/02/19 8:00 AM, admin at shuling.net wrote:
> Hi,
>
>   
>
> I create a table as follows:
>
>   
>
> CREATE TABLE MyTable (F1 INTEGER, F2 INTEGER);
>
>   
>
> Then add the following records:
>
>   
>
> INSERT INTO MyTable (F1, F2) Values (1, 2);
>
> INSERT INTO MyTable (F1, F2) Values (1, 3);
>
> INSERT INTO MyTable (F1, F2) Values (2, 4);
>
> INSERT INTO MyTable (F1, F2) Values (2, 5);
>
> INSERT INTO MyTable (F1, F2) Values (3, 6);
>
> INSERT INTO MyTable (F1, F2) Values (3, 7);
>
> INSERT INTO MyTable (F1, F2) Values (4, 2);
>
>   
>
> Now if two records have the same value of F1, then I will define them as
> conflict records.
>
>   
>
> Now I need to perform the following tasks:
>
>   
>
>      1. For all conflict records, get the total count of distinct F1 values.
> In the above sample, record 1, 2, 3, 4, 5, 6 are conflict records, but the
> distinct values are only 1, 2, 3 so the total count should be 3.
>      2. Get the total count of all the conflict records. In the above sample,
> it should be 6.
>      3. Set the F2 value of all the conflict records to 9. Keep all other
> records intact.
>
>   
>
> How to do that? Can task 2 and 3 be implemented in one SQL query to improve
> the performance?


I think you meant to use the word "Duplicate" where you said "Distinct". 
You need the number of Duplicate F1 records, which is 3.

Then you need the total count of records that might be duplicated for 
F1, which is 6 in the above case.

This all is easy and Hick's solution will work.

As to the question of doing 2 and 3 together, how can that ever be? 3 
does an update, and 2 expects a return value. There is no mix and match 
updates and selects (Though Postgres has a great way of doing a bit of 
both) - but apart from some convenience to the query creator, it has no 
real efficiency advantage, which is why it's not really done.

Best efficiency would be a temp table to avoid multiple walking of the 
original table and be very fast for even large datasets - I would advise 
this way perhaps:


------------------------------------------------
   -- Processing SQL in: E:\Documents\SQLiteAutoScript.sql
   -- SQLite version 3.9.2  [ Release: 2015-11-02 ]  on SQLitespeed 
version 2.0.2.4.

   -- Script Items: 10         Parameter Count: 0
   -- 2016-02-19 11:35:58.596  |  [Info]       Script Initialized, 
Started executing...
   -- 
================================================================================================


CREATE TABLE MyTable (F1 INTEGER, F2 INTEGER);

INSERT INTO MyTable (F1, F2) Values
  (1, 2)
,(1, 3)
,(2, 4)
,(2, 5)
,(3, 6)
,(3, 7)
,(4, 2)
;

CREATE TEMP TABLE F1Dups (F1_ID INTEGER PRIMARY KEY, F1_Count INT);

INSERT INTO F1Dups SELECT DISTINCT B.F1, 0
   FROM MyTable AS A
  INNER JOIN MyTable AS B ON B.F1 = A.F1
  WHERE  B.rowid <> A.rowid;

UPDATE F1Dups SET F1_Count = (SELECT COUNT(*) FROM MyTable WHERE 
MyTable.F1=F1Dups.F1_ID);


SELECT COUNT(*) AS Dist_Confl FROM F1Dups;

   --  Dist_Confl
   -- ------------
   --       3

SELECT SUM(F1_Count) AS All_Conf FROM F1Dups;

   --   All_Conf
   -- ------------
   --       6


UPDATE MyTable SET F2=9 WHERE F1 IN (SELECT F1_ID FROM F1Dups);

SELECT * FROM MyTable;

   --  F1 |  F2
   -- --- | ---
   --  1  |  9
   --  1  |  9
   --  2  |  9
   --  2  |  9
   --  3  |  9
   --  3  |  9
   --  4  |  2

DROP TABLE F1Dups;


   --   Script Stats: Total Script Execution Time:     0d 00h 00m and 
00.031s
   --                 Total Script Query Time:         -- --- --- --- 
--.----
   --                 Total Database Rows Changed:     19
   --                 Total Virtual-Machine Steps:     717
   --                 Last executed Item Index:        10
   --                 Last Script Error:
   -- 
------------------------------------------------------------------------------------------------

[sqlite] Process duplicate field values

Reply via email to