Re: Reverse Intersect

Bob Sneidar Tue, 25 May 2010 15:46:10 -0700

Hi Mark. Thanks for the interest. 

My apologies to everyone for the length of this post.

First, remember that I am working with paged data. The reason is this: 

Let's say the source database has a million records. Since I cannot use a join 
due to the fact they are dissimilar databases that are not connected in any 
way, I am going to have to query both databases seperately, yes? If I query the 
destination database for all it's keys I want to use to find records in the 
source database, on the first pass with an empty destination database I am 
going to return every single record! Huge dataset, Revolution crashes! 

But let's say I get away with it and Revolution does not crash (yet). Now I 
have to issue an SQL statement to the destination database something like 
SELECT sw_id FROM devices WHERE IN (insert huge million item comma delimited 
string here). If I don't kill the SQL server, I will probably kill Revolution 
this time with yet another 1 million record data set! 

You may say I could use WHERE NOT IN (insert huge million item comma delimited 
string here), but it doesn't solve my huge dataset and query problem. If I am 
paging through the data, then I will get every record not in the page, even if 
they exist in both databases. 

The solution? Page through the source database a reasonable number of records 
at a time, say 100 or 1000 or 10,000 depending on what the limits of Revolution 
and SQL are. Now my second query will return a dataset with matching records, 
excluding the records that haven't been added yet. At this point I have a 
dataset with all the records in my page, and another dataset with all the 
records in my destination database with matching keys, a subset of my paged 
data. 

Now I need to find out what those keys are that are in my page dataset that are 
not in my destination dataset. Hence my need for a reverse intersect command. 
My strategy of working with one record at a time proved to be cumbersome, 
especially with a remote SQL server where internet lag plays a part. Whether I 
use a temporary table or memory variables to work with the data seems 
irrelevant, except that the SQL database is remote, remember? Lag issues. The 
less individual queries I make, the better the performance will be. 

Bob

On May 25, 2010, at 3:18 PM, Mark Wieder wrote:

> stephen-
> 
> Tuesday, May 25, 2010, 3:08:33 PM, you wrote:
> 
>> if you want to remove duplicate keys - put the imported data immediately
>> into an array. Duplicates are automatically eliminated.
> 
> I was going to suggest something similar, but then thought this seems
> like a long way to go around for the desired result. Why not take the
> recordsets from the two database queries and use them on the fly to
> populate tables in a new database? Then you can run your sql matching
> statement on the new tables and delete the temporary database when
> you're done.
> 
> -- 
> -Mark Wieder
> mwie...@ahsoftware.net
> 
> _______________________________________________
> use-revolution mailing list
> use-revolution@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution

_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Reverse Intersect

Reply via email to