Quoth gc <gc1...@gmail.com>, on 2010-10-16 03:37:15 -0400:
> CREATE TABLE Code_Units    ("code", "unit", PRIMARY KEY ("code"));
> CREATE TABLE Base_Data_1 ("partner", "code", "unit", "value", PRIMARY KEY
> ("partner", "code"));
[...]

I might consider doing something like this (untested):

  CREATE TEMPORARY TABLE Code_Units_staging
      ("code", "unit", PRIMARY KEY ("code", "unit"));
  -- Note the difference in schema for the temporary table.
  INSERT OR IGNORE INTO Code_Units_staging ("code", "unit")
      SELECT DISTINCT "code", "unit" FROM Base_Data_1;
  INSERT OR IGNORE INTO Code_Units_staging ("code", "unit")
      SELECT DISTINCT "code", "unit" FROM Base_Data_2;
  [...]
  INSERT OR IGNORE INTO Code_Units_staging ("code", "unit")
      SELECT DISTINCT "code", "unit" FROM Base_Data_N;
  -- Now the rows with equal code/unit pairs have been merged, but
  -- any with the same code and different units have not, and will
  -- cause an error on this INSERT.
  INSERT INTO Code_Units ("code", "unit")
      SELECT "code", "unit" FROM Code_Units_staging;

(You don't strictly need the DISTINCT in this case; it might be faster
or slower though; I'd try it both ways if feasible.)

This is going to be very slow if errors are frequent, though, since it
won't catch them until you've iterated the entire set of data.  On the
plus side, it does make it easier with some modification to detect
where the discrepancies were (add extra columns for source data set to
the staging table, say).  How frequent are these invalid combinations
going to be, and how frequently does the Code_Units table need to be
updated?

Another approach would be to create a view and then use an INSTEAD OF
trigger on insertion into the view that ignores fully-identical rows,
raises an abort on same-code/different-unit rows, and does the insert
on new-code rows.  Maybe something like (again, untested):

  CREATE VIEW Code_Units_for_insert AS SELECT * FROM Code_Units;
  CREATE TRIGGER slightly_different_insert
      INSTEAD OF INSERT ON Code_Units
      FOR EACH ROW BEGIN
          INSERT OR IGNORE INTO Code_Units ("code", "unit")
              VALUES (NEW."code", NEW."unit");
          SELECT CASE WHEN (SELECT "unit" = NEW."unit" FROM Code_Units WHERE 
"code" = NEW."code")
                      THEN 1 ELSE RAISE(ABORT, "Code associated with multiple 
units")
                 END;
      END;

   ---> Drake Wilson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to