[Trac-dev] Refactoring the Trac data model - relational approach

Terje S. Sun, 04 Jul 2010 08:23:11 -0700

Hello list,

I've been observing since I deployed .11 for a small team last year.


I think the normalization topic raised by Felix in the look beyond 0.12 
thread is an important area to focus on; I ran into so many issues with 
this, I seriously considered abandoning half way. The problem manifests as 
difficulty to change the core systems; anything *slightly* outside the 
feature scope requires a lot of work to complete, even if the change is 
trivial in principle.

Christian replies to Felix:
> I'm not sure if I understand what you mean by "denormalized database". 
> Are you refering to the duplication of data in ticket related tables, as 
> discussed in #1890, or to the fact we don't use yet surrogate 
> keys except for the new repository table? Or the fact that relationships 
> between tables rely on "implicit" foreign keys (e.g. ticket -> milestone)?

I have no trouble relating to the original statement, it refers to all of 
the above and more; the general state of the data model, if you like. We 
are using the database to store rather complex information, but without 
the benefits of constraints(relations), triggers, views, checks or locking 
(to say nothing of procedures and other modern features). In the words of 
Joe Celko, Trac is in the magnetic tape mindset, using the database as if 
it was a serial I/O device. Consider the Association pattern in context:

     .- - - - - - - - - - - - - -.
  .< : "Resource Storage System" :
  :  '- - -+- - - - - - - - - - -'                           ____
  :        V                                                 V  |
  : .------+------. ----< .-------------.        .-----------+. |
  : | Work Effort | ----< | Association | >------| Assoc Type |-'
  : '------+------'  ____ '-------------' >--.   '------------'
  :        V         V  |                    |   .------------.
  :     .--+---------+. |                    '---| Assoc Rule |
  :.....| Effort Type |-'                        '------------'
        '-------------'

(If this is garbled in your end, see link at the bottom)

Effort Type.: Project -> Phase -> Task
Assoc Type..: Precedent, Concurrent, Complement, Substitute
Assoc Rule..: Work Breakdown (parent-child), Peer-to-Peer, Program
(Resource Storage System indicates how the custom fields/EAV might tie in)

Source: The Data Model Resource Book Vol 3 p 178 (ISBN 978-0-470-17845-4)

Work Effort is a unit of work, like a ticket, project, milestone, timebox 
or similar entity that represent work with a start..end time (enumerated 
in Effort Type). Rows in Association table connect *any* two work efforts 
together with a specified (or not) relationship type + rule (enumerated in 
the right-hand tables). This allows very flexible aggregation, and we can 
support all kinds of evolving/plug-in relationships, breakdown structures 
and entity types without changing the model. This is not "normalized" per 
se, but the idea is to model the real entities and relationships.

Consider the current owner/reporter/cc system and all of the quirks that 
accompany it, versus the Work Efforts model coupled with a proven role 
pattern: a table to represent specific relationships between units of work 
and people. We could suddenly have "Manager" roles, "Developer", "Tester", 
"Observer" or whatever is needed, anywhere in the work structure:

  .-------------.     .-------------.     .-----------------.
  | Work Effort |---< | Effort Role | >---| Parties (users) |
  '-------------'     '+------------'     '-----------------'
                       V           ____
                       |           V  |
                      .+-----------+. |
                      | Role Types  |-'
                      '-------------'

The big problem with GenericTrac, is that the core of the model is a 
*nonexisting* entity (resource), in practice emulating the database system 
within itself. We'll end up reinventing relational algebra in middleware, 
bit by bit, to compensate for this fact; If you still think that will be 
simpler long-term than *using* the established algebra to work with data, 
I can only suggest you consider it carefully before starting ;-)

So denormalization is not everything (though it is accurate), the core 
issue is the general under-utilization of database technology; slowly 
reinventing a square wheel by hardcoding things that the database is 
*designed* to do. Data anomalies, #1890 an example among many, occur mostly 
because the model was built to correspond to the visual, outward properties 
of the system (fields/forms/pages in userland), and not the underlying 
information structure it should represent.

I would be happy to contribute to modelling as best I can, but I'm not sure 
when the team can consider such fundamental philosophical and architectural 
changes, if at all? (it does imply a more or less complete rewrite)

Big thanks to everyone for all the hard work so far, and congratulations
on the .12 release. I enjoy the fruits of your labor, although I keep 
wishing for some whipped cream to go with it! ;)

All that said, I hope I did not offend or kill anyone with the lengthy 
amounts of text in this post. I realize it may be outside the Trac 
philosophy, scope, or simply beating a dead horse.. So be it! 

I speak up because I think the issue is critical for future success.

Here is some example SQL for the Association pattern:

http://trac.edgewall.org/wiki/RelationalModel

Cheers!

Terje

-- 
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/trac-dev?hl=en.

[Trac-dev] Refactoring the Trac data model - relational approach

Reply via email to