[sqlite] Avoid duplicate sets with join table

James K. Lowden Sun, 17 Apr 2016 19:32:35 -0400

On Sat, 16 Apr 2016 01:20:55 +0200
Ketil Froyn <ketil at froyn.name> wrote:


> I have two tables and a join table, in principle like this:
> 
> CREATE TABLE records (id INTEGER PRIMARY KEY, data TEXT);
> CREATE TABLE features (id INTEGER PRIMARY KEY, data TEXT UNIQUE);
> CREATE TABLE records_features (id_r INTEGER, id_f INTEGER, ord
> INTEGER);

(As a style suggestion, consider not reduplicating plurals.  It's a
one-man band, a 3-star general, and a 5-finger discount.  What you have
 is a set of record_features.)  

> Later, if I come across a new record that has exactly features
> "feature1", "feature2" and "feature3" (IDs 20,21 and 22), I don't want
> to add a new record for this. So given a new set of feature IDs, how
> can I best check if there's another record that has these exact
> features before I insert it?
> 
> To be clear, a record with features "feature1", "feature2", "feature4"
> would be ok. So would "feature1", "feature2". Subsets are ok, but not
> exact duplicates.

This is an application of relational division: you want to know if two
sets are equal.  Well, almost.  You really want to know if the "new" set
is a subset of an existing one.  

You're also a little stuck for lack of syntax.  There are two sets in
question: the extant one and the one to be inserted.  You can insert a
set in SQLite, 

        insert into T values (a), (b), (c);

but because that set of rows (a, b, and c) doesn't have a name, you
can't refer to them again in the same query.  So you have to dance a
little jig, something like:

        begin transation
        insert into record features ...
        delete from record features 
        id = new_id
        where exists ( 
                select 1 from record features as rf
                where rf.id_r = record_features.id_r
                and rf.id_f = record_features.id_f
                group by id
                having count(*) = (
                        select count(*) 
                        from record_features where id_r = new_id
                )
        );
        commit transaction

That puts the rows into the database -- where they can be examined --
and deletes them if they turn out to be a duplicate.  You could also
apply the same logic in an insert & update trigger, and probably should
if the rule applies to the *data*, and not just the application's
expectations.  

--jkl

[sqlite] Avoid duplicate sets with join table

Reply via email to