Re: [sqlite] Merging databases

Ken & Deb Allen Sat, 07 May 2005 08:33:11 -0700

I am not certain that there is such a thing as a "good algorithm" to achieve this, especially as there are some decisions that need to be made.

The first thing that you need to do is to load the text file information into a new set of tables, [newCustomers], [newProducts] and [newOrders].

What is needed is a translation or mapping between the [CustomerID], [ProductID] and [OrderID] values in the new tables to coincide with the values in the existing database. As noted, some customers and products will be the 'same' and some will be new. I would define tables named [MapCustomers], [mapProducts] and [mapOrders] that have as columns [newID] and [actualID], populating the tables with the identifiers from the [NewXxx] tables and an [ActualID] of NULL (or zero). Now we need to proceed with mapping those 'existing' records:

for each record in [newXxx] that exists in [Xxx], update [mapXxx] to set the [actualID] = [Xxx].[XxxID]

Some SQL dialects permit an update from a JOIN, but I am not certain that SQLITE supports that or not (have not tried it yet, and do not have test tables to make it work). The above is most easily implemented in code if the UPDATE/JOIN syntax is not supported. The difficulty is in determining the criteria that determines whether two records are 'the same': if they are identical that is easy; but if they differ only by the value of one or more non-critical fields, determining which is valid should be done first (for example, if two [Customers] records are the same except for the Zip/Postal code, then you need to determine which value is correct and use that) -- this is often referred to as "scrubbing" the data, something I have also done a great deal of over the years.

Once that is done, we need to update the remainder of the mapping records to have an appropriate ID value in the existing record set.

for each record in [mapXxx] where [actualID] IS NULL --> UPDATE [mapXxx] SET [actualID] = ( SELECT 1 + MAX(XxxID) FROM [Xxx] ) WHERE [newID] = thisID;

Now you are ready to merge the records into the database. Using a JOIN between the [newXxx] table and the [mapXxx] table you can create a record with the proper values:

    INSERT INTO [Xxx] (XxxID, ...)
        SELECT B.[actualID], A.[...]
        FROM   [newXxx] A
        JOIN   [mapXxx] B ON A.[XxxID] = B.[newID];

Now you can check the record counts and verify that all went well and drop the 'new' and 'map' tables.

I am not certain that I would refer to this as a 'brute force' method, as this is a pretty standard algorithm to use for merging data that I have used a number of times over the years. If the database supports it, I usually implement this inside a stored procedure, which has cursors to permit the type of "for each" looping that is often required.

-ken

On 7-May-05, at 9:57 AM, [EMAIL PROTECTED] wrote:

Hello,
Imagine you have a simple database with two tables, and a third to join them as a many-to-many relationship. Someone sends me an SQLite database as a file which has the same structure, but the data is different. I want to merge these two databases together keeping all of the relationships intact, without duplicating data.

For example, if the tables were "customers" and "products" and the one in between "orders", there is the possibility that some customers and/or products might be the same, but with different primary keys. The problem is that since the databases were independently created, a simple union will break the relationships since the primary keys will overlap.

I can think of brute force ways to do this, but I was wondering if anyone might have a good algorithm or technique to accomplish this efficiently.
Cheers!
Demitri

Re: [sqlite] Merging databases

Reply via email to