On 7/4/2010 4:30 PM, Terje S. wrote:
Hello list,
Hello,
I've been observing since I deployed .11 for a small team last year.
I think the normalization topic raised by Felix in the look beyond 0.12
thread is an important area to focus on; I ran into so many issues with
this, I seriously considered abandoning half way. The problem manifests as
difficulty to change the core systems; anything *slightly* outside the
feature scope requires a lot of work to complete, even if the change is
trivial in principle.
Agreed up to this point.
Christian replies to Felix:
I'm not sure if I understand what you mean by "denormalized database".
Are you refering to the duplication of data in ticket related tables, as
discussed in #1890, or to the fact we don't use yet surrogate
keys except for the new repository table? Or the fact that relationships
between tables rely on "implicit" foreign keys (e.g. ticket -> milestone)?
I have no trouble relating to the original statement, it refers to all of
the above and more; the general state of the data model, if you like.
Well, I actually think that the term "denormalization" isn't a very good
match for any of the above. These were things that I could remotely
relate to the notion, with only #1890 talking about the redundancy problem.
We are using the database to store rather complex information, but without
the benefits of constraints(relations), triggers, views, checks or locking
(to say nothing of procedures and other modern features).
Well, for example none of the above has anything to do with
denormalization. The latter term is used when optimizing an existing
schema by introducing a bit of redundancy for performance reasons, or
eventually designing upfront such a schema with this optimization in
mind. That's why I have trouble to relate this term with the Trac
schema: it's not really that we once had a perfectly normalized schema
that we later "denormalized", or even that we designed the schema with
denormalization in mind...
In the words of Joe Celko, Trac is in the magnetic tape mindset, using the
database as if it was a serial I/O device.
And there's probably a reason for that, as none of the constraints,
triggers, checks or advanced locking features, not even something as
simple as a proper date field, can be used today in a reasonably
database neutral way... Even with the restricted feature set of SQL
we're using, we would have trouble porting to Oracle, for example.
Consider the Association pattern in context:
.- - - - - - - - - - - - - -.
.< : "Resource Storage System" :
: '- - -+- - - - - - - - - - -' ____
: V V |
: .------+------. ----< .-------------. .-----------+. |
: | Work Effort | ----< | Association |>------| Assoc Type |-'
: '------+------' ____ '-------------'>--. '------------'
: V V | | .------------.
: .--+---------+. | '---| Assoc Rule |
:.....| Effort Type |-' '------------'
'-------------'
(If this is garbled in your end, see link at the bottom)
Effort Type.: Project -> Phase -> Task
Assoc Type..: Precedent, Concurrent, Complement, Substitute
Assoc Rule..: Work Breakdown (parent-child), Peer-to-Peer, Program
(Resource Storage System indicates how the custom fields/EAV might tie in)
Source: The Data Model Resource Book Vol 3 p 178 (ISBN 978-0-470-17845-4)
Work Effort is a unit of work, like a ticket, project, milestone, timebox
or similar entity that represent work with a start..end time (enumerated
in Effort Type). Rows in Association table connect *any* two work efforts
together with a specified (or not) relationship type + rule (enumerated in
the right-hand tables). This allows very flexible aggregation, and we can
support all kinds of evolving/plug-in relationships, breakdown structures
and entity types without changing the model. This is not "normalized" per
se, but the idea is to model the real entities and relationships.
I suppose this is just for illustration, as the above doesn't really fit
well to Trac. For example the "work effort" central in the above example
is defined by something "that represent work with a start..end time",
which is not how I would define the project, milestone and ticket
resources in Trac. They have such properties, but they are accidental
rather than essential.
Consider the current owner/reporter/cc system and all of the quirks that
accompany it, versus the Work Efforts model coupled with a proven role
pattern: a table to represent specific relationships between units of work
and people. We could suddenly have "Manager" roles, "Developer", "Tester",
"Observer" or whatever is needed, anywhere in the work structure:
.-------------. .-------------. .-----------------.
| Work Effort |---< | Effort Role |>---| Parties (users) |
'-------------' '+------------' '-----------------'
V ____
| V |
.+-----------+. |
| Role Types |-'
'-------------'
We indeed lack a proper "user" resource in the system, though we have
the session tables. We have some improvement to do in that area, most
notably w.r.t. to user information providers (#2456) and reworking the
group management (#5648).
The big problem with GenericTrac, is that the core of the model is a
*nonexisting* entity (resource), in practice emulating the database system
within itself. We'll end up reinventing relational algebra in middleware,
bit by bit, to compensate for this fact; If you still think that will be
simpler long-term than *using* the established algebra to work with data,
I can only suggest you consider it carefully before starting ;-)
Well, the GenericTrac page is more a scratch pad for new ideas. I
haven't yet found the time to write down the last batch of ideas I had
on the topic, before the 0.12 release caught most of my attention, so
this is by no means a finalized plan.... And for the non-existing
resource, think about it as a kind of "abstract base class", there's no
resource as such but most of the existing entities (wiki page, ticket,
milestone, repository, etc.) can be seen as resources, they have an
unique identity, properties, and some of those properties are actually
relations to other resources.
But you nevertheless nailed the problem scope quite well: how much do we
want to use the data modeling features of SQL versus abstracting and
doing some modeling in the middleware. Doing more in middleware can give
us a higher degree of flexibility, while still having simple "building
blocks" in the database, especially if we think about future extensions,
like supporting more complex 1-n relations.
By the way, the trade-off between "specific and generalized styles of
modeling" is nicely summarized in table 1-1 of the first chapter of the
book you cited above (the downloadable chapter from wiley.com,
http://media.wiley.com/product_data/excerpt/50/04701784/0470178450.pdf).
So denormalization is not everything (though it is accurate), the core
issue is the general under-utilization of database technology; slowly
reinventing a square wheel by hardcoding things that the database is
*designed* to do. Data anomalies, #1890 an example among many, occur mostly
because the model was built to correspond to the visual, outward properties
of the system (fields/forms/pages in userland), and not the underlying
information structure it should represent.
What you're describing here is the fact that the current Trac data model
is following a "specific" style of modeling (still according to the
terminology of that book). Trying to address this problem by going to a
more "generalized" style of modeling is precisely the ambition behind my
GenericTrac approach. How to do it best and how much abstraction is
needed is of course an open question. For example, we perhaps don't want
to push the abstraction to the level of using triples...
I would be happy to contribute to modelling as best I can, but I'm not sure
when the team can consider such fundamental philosophical and architectural
changes, if at all? (it does imply a more or less complete rewrite)
I'm certainly interested in seeing some constructive criticism on
GenericTrac. As I've said above, I'll soon rewrite some parts of the
proposals, and keep you posted. As Remy said, there anyway won't be a
complete rewrite but rather small steps in this direction, if that
proves to be useful.
Big thanks to everyone for all the hard work so far, and congratulations
on the .12 release. I enjoy the fruits of your labor, although I keep
wishing for some whipped cream to go with it! ;)
All that said, I hope I did not offend or kill anyone with the lengthy
amounts of text in this post. I realize it may be outside the Trac
philosophy, scope, or simply beating a dead horse.. So be it!
I speak up because I think the issue is critical for future success.
Here is some example SQL for the Association pattern:
http://trac.edgewall.org/wiki/RelationalModel
Thanks indeed for your feedback,
-- Christian
--
You received this message because you are subscribed to the Google Groups "Trac
Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/trac-dev?hl=en.