Re: [Trac-dev] Refactoring the Trac data model - relational approach

Christian Boos Mon, 05 Jul 2010 08:13:49 -0700

On 7/4/2010 4:30 PM, Terje S. wrote:

Hello list,


Hello,

I've been observing since I deployed .11 for a small team last year.

I think the normalization topic raised by Felix in the look beyond 0.12
thread is an important area to focus on; I ran into so many issues with
this, I seriously considered abandoning half way. The problem manifests as
difficulty to change the core systems; anything *slightly* outside the
feature scope requires a lot of work to complete, even if the change is
trivial in principle.


Agreed up to this point.

Christian replies to Felix:

I'm not sure if I understand what you mean by "denormalized database".
Are you refering to the duplication of data in ticket related tables, as
discussed in #1890, or to the fact we don't use yet surrogate
keys except for the new repository table? Or the fact that relationships
between tables rely on "implicit" foreign keys (e.g. ticket ->  milestone)?

I have no trouble relating to the original statement, it refers to all of
the above and more; the general state of the data model, if you like.

Well, I actually think that the term "denormalization" isn't a very goodmatch for any of the above. These were things that I could remotelyrelate to the notion, with only #1890 talking about the redundancy problem.

We are using the database to store rather complex information, but without
the benefits of constraints(relations), triggers, views, checks or locking
(to say nothing of procedures and other modern features).

Well, for example none of the above has anything to do withdenormalization. The latter term is used when optimizing an existingschema by introducing a bit of redundancy for performance reasons, oreventually designing upfront such a schema with this optimization inmind. That's why I have trouble to relate this term with the Tracschema: it's not really that we once had a perfectly normalized schemathat we later "denormalized", or even that we designed the schema withdenormalization in mind...

In the words of Joe Celko, Trac is in the magnetic tape mindset, using the 
database as if it was a serial I/O device.

And there's probably a reason for that, as none of the constraints,triggers, checks or advanced locking features, not even something assimple as a proper date field, can be used today in a reasonablydatabase neutral way... Even with the restricted feature set of SQLwe're using, we would have trouble porting to Oracle, for example.

  Consider the Association pattern in context:

      .- - - - - - - - - - - - - -.
   .<  : "Resource Storage System" :
   :  '- - -+- - - - - - - - - - -'                           ____
   :        V                                                 V  |
   : .------+------. ----<  .-------------.        .-----------+. |
   : | Work Effort | ----<  | Association |>------| Assoc Type |-'
   : '------+------'  ____ '-------------'>--.   '------------'
   :        V         V  |                    |   .------------.
   :     .--+---------+. |                    '---| Assoc Rule |
   :.....| Effort Type |-'                        '------------'
         '-------------'

(If this is garbled in your end, see link at the bottom)

Effort Type.: Project ->  Phase ->  Task
Assoc Type..: Precedent, Concurrent, Complement, Substitute
Assoc Rule..: Work Breakdown (parent-child), Peer-to-Peer, Program
(Resource Storage System indicates how the custom fields/EAV might tie in)

Source: The Data Model Resource Book Vol 3 p 178 (ISBN 978-0-470-17845-4)

Work Effort is a unit of work, like a ticket, project, milestone, timebox
or similar entity that represent work with a start..end time (enumerated
in Effort Type). Rows in Association table connect *any* two work efforts
together with a specified (or not) relationship type + rule (enumerated in
the right-hand tables). This allows very flexible aggregation, and we can
support all kinds of evolving/plug-in relationships, breakdown structures
and entity types without changing the model. This is not "normalized" per
se, but the idea is to model the real entities and relationships.

I suppose this is just for illustration, as the above doesn't really fitwell to Trac. For example the "work effort" central in the above exampleis defined by something "that represent work with a start..end time",which is not how I would define the project, milestone and ticketresources in Trac. They have such properties, but they are accidentalrather than essential.

Consider the current owner/reporter/cc system and all of the quirks that
accompany it, versus the Work Efforts model coupled with a proven role
pattern: a table to represent specific relationships between units of work
and people. We could suddenly have "Manager" roles, "Developer", "Tester",
"Observer" or whatever is needed, anywhere in the work structure:

   .-------------.     .-------------.     .-----------------.
   | Work Effort |---<  | Effort Role |>---| Parties (users) |
   '-------------'     '+------------'     '-----------------'
                        V           ____
                        |           V  |
                       .+-----------+. |
                       | Role Types  |-'
                       '-------------'

We indeed lack a proper "user" resource in the system, though we havethe session tables. We have some improvement to do in that area, mostnotably w.r.t. to user information providers (#2456) and reworking thegroup management (#5648).

The big problem with GenericTrac, is that the core of the model is a
*nonexisting* entity (resource), in practice emulating the database system
within itself. We'll end up reinventing relational algebra in middleware,
bit by bit, to compensate for this fact; If you still think that will be
simpler long-term than *using* the established algebra to work with data,
I can only suggest you consider it carefully before starting ;-)

Well, the GenericTrac page is more a scratch pad for new ideas. Ihaven't yet found the time to write down the last batch of ideas I hadon the topic, before the 0.12 release caught most of my attention, sothis is by no means a finalized plan.... And for the non-existingresource, think about it as a kind of "abstract base class", there's noresource as such but most of the existing entities (wiki page, ticket,milestone, repository, etc.) can be seen as resources, they have anunique identity, properties, and some of those properties are actuallyrelations to other resources.

But you nevertheless nailed the problem scope quite well: how much do wewant to use the data modeling features of SQL versus abstracting anddoing some modeling in the middleware. Doing more in middleware can giveus a higher degree of flexibility, while still having simple "buildingblocks" in the database, especially if we think about future extensions,like supporting more complex 1-n relations.

By the way, the trade-off between "specific and generalized styles ofmodeling" is nicely summarized in table 1-1 of the first chapter of thebook you cited above (the downloadable chapter from wiley.com,http://media.wiley.com/product_data/excerpt/50/04701784/0470178450.pdf).

So denormalization is not everything (though it is accurate), the core
issue is the general under-utilization of database technology; slowly
reinventing a square wheel by hardcoding things that the database is
*designed* to do. Data anomalies, #1890 an example among many, occur mostly
because the model was built to correspond to the visual, outward properties
of the system (fields/forms/pages in userland), and not the underlying
information structure it should represent.

What you're describing here is the fact that the current Trac data modelis following a "specific" style of modeling (still according to theterminology of that book). Trying to address this problem by going to amore "generalized" style of modeling is precisely the ambition behind myGenericTrac approach. How to do it best and how much abstraction isneeded is of course an open question. For example, we perhaps don't wantto push the abstraction to the level of using triples...

I would be happy to contribute to modelling as best I can, but I'm not sure
when the team can consider such fundamental philosophical and architectural
changes, if at all? (it does imply a more or less complete rewrite)

I'm certainly interested in seeing some constructive criticism onGenericTrac. As I've said above, I'll soon rewrite some parts of theproposals, and keep you posted. As Remy said, there anyway won't be acomplete rewrite but rather small steps in this direction, if thatproves to be useful.

Big thanks to everyone for all the hard work so far, and congratulations
on the .12 release. I enjoy the fruits of your labor, although I keep
wishing for some whipped cream to go with it! ;)

All that said, I hope I did not offend or kill anyone with the lengthy
amounts of text in this post. I realize it may be outside the Trac
philosophy, scope, or simply beating a dead horse.. So be it!

I speak up because I think the issue is critical for future success.

Here is some example SQL for the Association pattern:

http://trac.edgewall.org/wiki/RelationalModel


Thanks indeed for your feedback,

-- Christian

--
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/trac-dev?hl=en.

Re: [Trac-dev] Refactoring the Trac data model - relational approach

Reply via email to