On Mon, Jan 02, 2012 at 07:01:29AM +0000, Simon Slavin scratched on the wall:
> 
> On 2 Jan 2012, at 5:25am, Durga D wrote:
> 
> > "create table if not exists durtree  (id integer primary key autoincrement,

[...]

> Besides which, almost any schema where you have to number your columns is
> a bad schema.  You should be able to hold the entire schema in your head
> at one time.

  In the sense of being a properly normalized and theoretically ideal
  schema, I would agree, but the real world has a tendency of getting
  in the way of theoretically ideals.

  "Normalized until it hurts, denormalize until it works."

> > I want to make c1 to c70 as unique with default null. But, I could not with
> > above query. I can make c42 to c70 as default null  and c1 to c70 as
> > unique. If I add default null to c41 and c40, it gets failed.
> 
> The above does not make each of the c nodes unique.  It makes each
> combination of c nodes unique: every single c would have to be identical
> in two rows for SQLite to reject it as violating your UNIQUE constraint.

  Exactly?  If he's looking to store a tree that allows direct access
  to specific nodes, the numbered columns are essentially storing a path
  through the tree.  Any single node can have multiple children (or
  itself be a node), so the individual columns don't need to be unique.
  The path as a whole, however, must be unique to represent a single node.

> There's no way to make each node unique in the above schema because a
> node could be in c65 in one row and c66 in another row.

  I'm not following you.  Many types of trees allow leaf nodes at
  different levels.

> A data structure more used for storing trees would look something like this:
> 
> CREATE TABLE IF NOT EXISTS durTreeNodes (
>       treeNumber INTEGER,
>       parent TEXT,
>       nodes TEXT NOT NULL,
>       UNIQUE (treeNumber, node))
> 
> This allows for trees of any height to be held efficiently.  The parent
> value of the root node could be null, or 'root' or something.

  *Held* efficiently, perhaps-- but not accessed.  What you're
  describing is the "Adjacency Model", which is the most basic way to
  store tree-like structures in SQL.  It has some serious drawbacks,
  however.  Standard SQL has no type of recursive or looping features,
  meaning "walking" a tree to find a specific node, even when the full
  path is known, takes multiple queries.  In specific, it takes as many
  queries as the node is deep in the tree.  This makes it extremely
  slow.  Oracle has some syntax short-cuts to deal with this, but they're
  non-standard.

  Your unique constraint is also radically different.  You're forcing
  nodes to be unique across whole trees, while the original schema only
  forces them to be unique if they have the same parent (which is
  usually what most people want).

  These compromises are are part of the reason there are so many different
  models to store tree-based structures in an SQL database (or even in
  many "NoSQL" systems, for that matter).  Each method has specific
  advantages and disadvantages.  Some are compact to store, but slow to
  access, others (like the "nested set") are quick to access, but very
  expensive to modify.

  Using a "path" based approach is definitely a valid approach.  Most of
  the time it is done with a single column, however, using some type of
  delimiter for the tree levels.  For example, have one column with the
  value "/usr/local/bin" rather than {c1="usr", c2="local", c3="bin"}.
  You can then use a UNIQUE constraint on just that one column. 
  Wild-card LIKE matches allow access to sub-trees.  Using a single
  composite path value also eliminates a specific depth limit.

> > objective: I am trying to store tree in a sqlite3 db depth of 70.  I need
> > high performance when accessing any level of the tree.
> 
> The above schema, with its very large number of columns, is not going
> to be very fast.  To retrieve c70 from a row SQLite will need to count
> through 70 columns before it can get to it.

  That's hardly a big deal, especially if the common case is that most
  of the columns are NULL.

> If you want to retrieve an entire level of a tree at once, why not
> store it that way ?

  He's not asking about pulling an entire level, he's saying he wants
  to pull any node, at any level, with a single query.  The Adjacency
  Model (storing a "parent" value) doesn't allow this... the
  application has to "walk" the tree, making one query to get through
  each level of the tree.  A "path" model allows you to jump to any
  node (at a known path) in a single query.
  
> CREATE TABLE IF NOT EXISTS durTreeLevels (
>       treeNumber INTEGER,
>       levelNumber INTEGER,
>       nodes TEXT,
>       UNIQUE (treeNumber, levelNumber))
> 
> In the 'nodes' column you put a list of nodes, separated by commas
> or something.  This also allows trees of any height to be held efficiently.

  Seriously?  That breaks First Normal Form.  Stuffing lists into a
  single database value is a Database 101 no-no.

  Not to mention your constraint only allows one descending node per level.
  That's not a tree, it's a stick.



  As I said, there are many different well understood and well tested
  models for storing trees and other hierarchies in an SQL database.
  I would suggest the OP do some web searches on hierarchies and SQL.
  This is, for the most part, a solved problem.

  Joe Celko also has a whole book on on the subject:
  
http://www.amazon.com/Hierarchies-Smarties-Kaufmann-Management-Systems/dp/0123877334/

  Although much of it is just a rehash and expansion of the material
  covered in his main book (which I would strongly recommend for
  any one doing any semi-serious SQL programming):
  http://www.amazon.com/Joe-Celkos-SQL-Smarties-Fourth/dp/0123820227/

  Overall, using a "path" type model may very well fit your needs, I
  would just suggest using a single composite path value, rather than
  trying to break the path value down into individual components and
  store those each in their own column.

    -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to