Re: [U2] Migration

Robert Houben Fri, 24 Dec 2010 07:50:02 -0800

A read on a primary key is about as efficient as an MV hashed read.  Each has 
their trade-offs. Get the modulo wrong and your MV read can be nasty.  You can 
get a SQL table in trouble, too.  You really can't beat getting all your data 
in one disk head movement, but we were talking about just creating a single 
PICK file for each table, to keep the migration "simple".

There are some excellent articles on how indexes work in SQL Server that you 
may wish to read.  Your understanding below is partially correct in places, but 
indexes do NOT copy all the data.  Just the indexed fields. Note that a PICK 
index has to copy all the item-ids and all the indexed fields, but you don't 
get anything pre-sorted, just hashed. Note that the primary key sorts the 
actual table. All subsequent indexes are actually sorted copies of the indexed 
columns with keys or some other references to the real records. If your query 
only uses fields in the index, you won't ever read the real table's data.

SQL will beat MV every time when you sort fields that are indexed.  For direct 
reads, MV seems to have a slight advantage.  Inserts and updates that affect 
indexed fields are slower in SQL (inserts are painfully slow if you fail to 
size your SQL table well, but try "inserting" millions of records into a file 
with a modulo of 1...)

Here's one short article with some diagrams that show what goes on in clustered 
and non-clustered indexes:
http://technet.microsoft.com/en-us/library/aa964133(SQL.90).aspx

-----Original Message-----
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Mecki Foerthmann
Sent: Friday, December 24, 2010 5:56 AM
To: U2 Users List
Subject: Re: [U2] Migration

I was under the impression that when a relational table is being "indexed" the 
DBMS creates and maintains a sorted copy of the original table for every 
indexed field.
That means for clustered indices tables sorted by every conceivable combination 
need to be maintained, having a huge impact on space requirements and 
performance.
So when you are looking for a row or rows that match the indexed key, in a 
RDBMS an algorithm can be used to locate the row(s) instead of reading in the 
whole table and comparing every single row. So you could for instance go to the 
middle of the (already pre-sorted) table (if you have
1000000 Million rows in your table you read row 500000) and check if the key is 
greater equal or smaller than the value you are looking for and carry on that 
way until you have a match. That way you eliminate half the number of rows you 
need to compare with every read. Of course they probably use much more 
sophisticated algorithms these days.
But regardless what algorithm they may use it has to be slower than the hashing 
algorithm used by mv as long as you have well sized files using sensible item 
ids.
To save space some RDBMS may also have implemented reduced datasets so they may 
just hold the indexed keys in the row instead of duplicating the whole data; in 
which case the primary index somehow needs to be used to retrieve the data in 
the row afterwards.
So as you can see even in a RDBMS there is calculation going on when indices 
are used.
I would actually go so far to say that relational databases don't use real 
indices at all. They just duplicate the dataset sorted in different ways.
But that of course is a matter of how you define what a "real" index is 
supposed to be.

Well, and when you want to "sort" then you just read the already sorted table 
into memory instead of the original - so it can be a lot faster than reading a 
list of item ids from an index file and then reading the items one by one from 
the data file using the hashing algorithm as it is done in the mv-world.
That is also the reason why mv can only use one index at a time and why we 
don't need joins.

On 24/12/2010 04:45, fft2...@aol.com wrote:
> In a message dated 12/23/2010 4:28:38 PM Pacific Standard Time,
> antli...@youngman.org.uk writes:
>
>
>>> SQL uses indexes.  MV uses cross references to item-ids (MV
>>> sometimes
>> supports indexes, but they don't always work as well as in the
>> relational
>> world.)
>> I don't know as that is true ... or are you using the word "index" to
>> mean something completely different to me? I'll agree the
>> implementation of indices can be buggy, but surely that's true of relational 
>> engines too?
>>
> I'm not quite sure I'm confortable with the idea (expressed in the
> prior-prior posting of which I here quote and enquote the reponse)
> that MV uses cross-references to item-ids.
>
> To me a hash table, isn't the same thing as a cross-reference which
> sounds a lot like a secondary key.  Hashing calculates an exact jump
> point at which a group of related records are kept.  They are related
> by having the same hash value.  But the hash value itself isn't looked up, 
> it's a calculation.
>
> I wonder if you can setup a first normal form table in such a way,
> that it maintains a constant sorted order ?  Sorting on the primary
> key, would then be merely display time bound, there is no effort to
> it.  I suppose you could even pick up and lay down the database
> periodically so the sort order matches the actual disk layout.  Pick
> could never do something like that.  There is always going to be
> effort involved in any sorting, even if you're simply traversing the index 
> tree and grabbing the underlying data records.
>
> But back to reality, I don't think SQL works this way anyway.  The
> perception that it sorts much faster is probably related more closely
> to the horsepower behind the scenes.  Pick systems tend to be
> installed on slower systems because they are so efficient and most
> users are cheap with their database, and expensive with their
> graphics.  So they install the SQL type databases on speedy machines.
> _______________________________________________
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
>
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] Migration

Reply via email to