At 7:59 AM -0500 12/13/07, Tom Briggs wrote:
 > Something I will say about this, for people who don't know, is that
 this columnar thing is strictly an implementation detail.  While

   I think that this is an oversimplification.  That's somewhat like
saying that the way you use a sledge hammer is no different than how you
use a claw hammer, because they're both hammers.  Anyone who tries to
hang a picture with a sledge hammer will be rather unhappy with your
advice.

   Though at some level how the data is stored is indeed an
implementation detail, to take full advantage of the fact that it is
requires re-thinking schema design and in some cases even query design.
See
http://www.fulltablescan.com/index.php?/archives/44-Compression-as-Normalization...-Kinda.html for more info.

If you read further into your own link, including the comments, you'll see my case also being made.

I don't see your claw hammer / sledge hammer argument being applicable here.

The relational model of data is all about letting users defining the logical/possible structure of their data, with a focus on the meaning of the data relative to other data. It is intended to make data manipulation and queries both easy to use and logically rigorous. The schemas that users design, including their so-called normalization traits, are just for reflecting the meaning of their data in a logical manner.

This model is completely removed from how the data is physically stored, eg whether in rows first or in columns first, and the physical store is determined just by the DBMS behind the scenes, and hence is an implementation detail. The DBMS can arrange how it likes in order to satisfy the logical model in a way that performs well. The users do not contort their schemas beyond what is logically clean in order to gain performance; having a clean schema will let the DBMS infer this automatically.

According to some comments, Vertica (a column-store maker) is making the same case that I am, which is just to have a logical clean schema, and performance benefits will automatically follow from that.

I will also note that the highest level of schema normalization, 6th Normal Form, essentially puts every non-key column in its own table.

I will also note that a column-based store essentially works like a heavily indexed row-based store, in which there is an index on every key or every column, and so all searches, which includes those on which joins are performed, can/do look in what is otherwise indexes. This is potentially slower for updates (or maybe not), but can be faster for queries, depending on circumstances.

-- Darren Duncan

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to