Re: [sqlalchemy] Using Guid as Primary Key

Christopher Lee Thu, 06 Dec 2012 11:30:53 -0800

I use guid primary keys fairly heavily.  There is a performance impact due
to the size of the key; you just can't fit as much of the index in memory
as you otherwise could.  On the flip side, when you have multiple database
shards, you never need to worry about keys being duplicated, so you can
move data between shards at will.  (Of course you can solve that problem
with integer keys; it is just more complex, and hence more error prone.)


In addition to their size (bloating both the rows and the indexes, causing
fewer rows to fit in memory), there are a few more caveats:

Guids are not necessarily random (it depends on how you generate the guid).
 Like Michael said, that can lead to worst-case performance, since hashing
them naively can lead to many collisions.  Either use mathematically random
guids, or double-check your hash table performance, or both.

Guids are much more random than auto-incrementing integers.  Since
databases tend to cluster on disk around their primary key, multiple
inserts will spread themselves across the disk.  That can be very good for
performance, but it is bad if you tend to query for rows in order.  With
integer keys, when you page in one row, if you query for the next primary
key, that data is likely to be sitting resident in database memory, or at
least on the disk cache.

I used guid primary keys in Microsoft's  SQLServer (around 4-5 years ago),
on a table that had high numbers of inserts and deletes.  The database had
trouble maintaining proper table statistics, which led to the query
optimizer making some poor life choices, resulting in very poor
performance.  I think that is specific to the database and our poor
high-churn architecture, but it is worth noting.  I have not have that
problem in MySQL or PostgreSQL.

None of these problems are insurmountable, of course.  It just pays to stay
on your toes and test your performance regularly.


On Tue, Dec 4, 2012 at 2:28 PM, Michael Bayer <mike...@zzzcomputing.com>wrote:

>
> On Dec 4, 2012, at 4:46 PM, Wolfgang Keller wrote:
>
> >>>> Can I use the Guid as primary key? I am newbie to sql and mysql
> >>>> management.
> >>>
> >>> Using such auto-generated surrogate keys is always a really bad idea
> >>> and the straightest and shortest way to data inconsistency hell
> >>> (especially through duplicates).
> >>
> >> you've seen two guids generate as duplicates ?
> >
> > You missed my point - *exactly*. ;-)
> >
> > The very point of "natural" (as opposed to "surrogate") keys is that any
> > real world "thing" (such as e.g. a person) represented by a database
> > entity must have exactly one *and only one* matching record in the
> > corresponding database table.
> >
> > Auto-generated "GUID"s are exactly what does *not* prevent the
> > generation of several records for one single real-world "thing". By
> > simple multiple manual entry of the same data several times due to
> > operator error. Identifying a "thing" by a natural identifier, i.e. one
> > that is actually unique for each and every "thing" *in the real-world*
> > does prevent such inconsistency.
> >
> > I once came across an article describing one of the probably worst
> > cases of database havoc caused by surrogate keys in computer history -
> > the EMR system of the US veterans health administration. According to
> > this article, they face the problem today that each and every individual
> > veteran has an arbitrary, unknown number of corresponding database
> > records - and there's no way to reconcile this mess due to the sheer
> > volume and various technical issues such as typos etc.
>
> oh, well this is the "natural keys are better" argument.  Unfortunately,
> surrogate integer PKs are necessary in most real-world installations as
> they provide far better performance than a typical natural key, when you
> consider that they are also copied out to all the referencing FK columns
> and are present in many indexes.   Even though SQLA totally supports
> natural keys as well as it can, I don't generally use natural PKs in my
> real-world contracts.  The DBAs I work with won't allow them.
>
> I'd say any system that actually has any human being *manually entering* a
> surrogate PK value as part of the application's user interface is
> intrinsically broken.   Especially a GUID value.
>
> I will often add a UNIQUE constraint to the "natural" PK of the table,
> next to the surrogate PK, to avoid the data duplication issues you refer
> to.    I've used surrogate PKs for decades without data duplication issues.
>
> That said, I did have a bad experience with GUIDs, which were in fact
> "natural" guids that were generated deterministically from other elements
> of the data - which was that the performance suffered terribly,
> specifically because of all that heavy GUID data copied out to all the
> referencing FKs and the associated indexes.
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/sqlalchemy?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Using Guid as Primary Key

Reply via email to