Re: [Trac-dev] Refactoring the Trac data model - relational approach

Terje S. Thu, 08 Jul 2010 22:44:58 -0700

Good morning.

Christian, I am really glad you followed up and did not leave the topic 
dead, as I was beginning to suspect you had.

Your reply is, as was your last, exactly the type of discussion I hoped
to raise when I first posted. Thank you for staying on topic as relates to
the proposal itself, I sincerely regret any and all insults communicated due
to getting carried away on several oocasions.

(I did post a message apologizing unconditionally to you Steffen yesterday, 
it does not appear to have made it through.)

On Thu, Jul 08, 2010 at 03:44:02PM +0200, Christian Boos wrote:

> "Prior to version 3.6.19, SQLite did not support foreign key constraints"
> (http://www.sqlite.org/cvstrac/wiki?p=ForeignKeyTriggers)

Granted, this is true, but joins, views and triggers are still 
supported, and triggers can emulate the keys and cascades for the 
critical relationships in these older versions. See:

http://www.sqlite.org/cvstrac/wiki?p=ForeignKeyTriggers

> As for views, I don't see an use case yet, but if there's one, there
> wouldn't be problems to use them, except for updates, which is also
> not something consistently supported.

The use case for views in data modelling without update support is 
absolutely present! In this context, you could have Work Efforts at 
the bottom ("ticket" being a row in effort type). The view would then 
have the name Tickets, and join these systems together so they can be 
reached in a uniform way from the users perspective (for reporting).

CREATE VIEW Tickets AS 
        SELECT * FROM WorkEfforts WHERE effort_type = [ticket type id]
        JOIN [more relevant information];

And this view would then present a collection of parts from the underlying
model, and provide a list of known Ticket "resources" in the work hierarchy.
>From the middleware perspective, you use the views to gather data if it
makes sense (many engines cache it), or go directly to the lower levels 
if that is the more efficient approach to whatever operation you are 
programming.

Consider the word "Ticket". It really only makes sense to the
submitter, "here is your ticket number, get in line". From all other 
perspectives after its submission, it represents a unit of work. 

Either it is a feature request that must be considered (then discarded,
or implemented which is all work), or patch (review/commit/reject it), 
support request (help submitter) etc. This entity also, unambigiously, 
is handled by a *WORK*flow system, does that not sound like a hint?;-)

Ticket is a *way to look* at a work effort (is a VIEW of work effort..)

The word "Ticket" is used only because it corresponds to a logical entity 
in the *front-end* system. This point is important to note:

The Ticket concept belongs in the front-end system, - as layering of 
software dictates - should be translated downwards. You *cannot* model 
this *front-end* concept *directly* in the *storage layer* as you do, 
without serious side effects.

Anyway with this approach, you are free to refactor the underlying data 
model at any point in time when you hit a performance problem. You will 
have to change the middleware to go with it, but since all reporting that 
the *enduser* does, is against the VIEWs, we can implement entirely 
new parallel reporting paradigms while leaving the old in, supporting both,
without impacting existing code (yet to be ported), runtime performance 
or locally defined reports. All you have to do, is make sure the views 
present the same information after you change the underlying model.

> don't mind a little reinvention and abstracting away from the
> database, if this can help modularity, code reuse and overall
> simplification of the system. I think that trying to map the model

It is exactly these goals++ that will be acheived by normalizing the model, 
as I have argued previously, and strengthen further below.

> (addicted?/clean? didn't get it, really). The question is, can you

If you do it the normalized way, nobody can **EVER** blame your code again,
because it is the wizards in the database engine that are not doing their job 
well enough, and it can be fixed with a better performing engine.

If you do it non-relational, the performance problem exists in your code and
must be handled for all eternity. Thus you have given the wizards inside
the database heroin, and they will not recover and help you solve any
problems. You are left to solve them all in your own middleware, while the
workforce within the database is sleeping drugged-down on the sidewalk.
Changing engine will only help you for an irrelevant period of time.

Did that clear it up? ;-)

> come up with a clean "relational" design which improves upon that,
> which is normalized, easy to use, fast, scalable, yet extensible in
> terms of adding/removing custom properties at run-time? Then I'm all
> ears because it's also exactly what I'm looking for.

Not a "perfect" solution, of course, because of the special requirements on
sqlite versions and the complexity of the system. But *signifficant* 
improvements can be made, as I hope this post will show more clearly..

The exact workings of the custom field subsystems, I will supply input on
down the line, when a few other key parts of the design has been done. Its
exact final scope of work is quite unclear to me at the moment.

> if we would fold everything into custom properties, but that would
> most certainly be detrimental to performance, due to the very simple
> key/value model we use there. So we need something a bit smarter,
> but still as flexible.

What you have failed to do, is to discover the fundamental properties of the
entities you are modelling. Both the ticket and the milestone represents
work units in a hierarchy, as is favoured by some project management
disciplines, even if you *still* think they don't ;-)

> The model you proposed in the RelationalModel page is only *very
> remotely* related to what Trac needs, so maybe you should try to

Keep in mind this statement you made, I think you may be surprised ;-)

This model will help especially in a multi-project scenario. It provides 
*FULL* abstraction for *ALL* kinds of hierarchies (via Association 
pattern). None of you realize the *first* implications of this model at 
all, it seems to me?? You can subdivide this hierarchy forever downwards. 
Permissions can be programmed to apply upwards or downwards in different 
types of hierarchies present.

Notice, that you can actually describe a whole Trac environment as unit of
work with a start-end date (and its milestone/tickets in a hierarchy below
it). And as such, this model provides a full, flexible, proven mechanism 
that in the future could support an import of, say, 10 normal environments
into a single relational structure, and still maintain all information 
about where it came from in the first place. New relationships could now be
added, stating that a ticket at one place depends on the completion of a
milestone in what used to be a different environment, or must be 
concurrent with two milestones from two different environments.

And when you take this even further, consider how two Tracs could exchange
a part of this hierarchy, say Trac "1.0" contacts an old .12 env elsewhere 
on the globe via XMLRPC. It could then create a local work unit to represent
an external ticket/milestone that will be synced periodically. The model
can handle this simply with a new row in the Effort Type to classify the
entity as "External work". You could now use this entity as a dependency
*anywhere* in your own work breakdown, control it with a local testing 
or other workflow etc.

Obviously, a plugin can register a row in the Effort Types, and for example
provide support for an external ticket in another issue tracker alltogether:

INSERT INTO WorkEffortTypes (name) VALUES ('External Work'); -- id 1
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Trac .11 XMLRPC');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Trac .11 .db');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Trac .12 XMLRPC');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Trac .12 .db');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Bugzilla');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Redmine');

Each of these rows would have a corresponding piece of code, core or plugin,
that takes care of the related business logic. This logic is free to sync in
an n-level hierarchy of work below itself from the external source, and this
is completely transparent to the rest of the system. It would appear as local
work; read-only or read-write, pull/push sync, marked explicity or not, or 
whatever the code is made and configured to do.

> bridge the gap. For an example use case, explain how you would
> handle the addition of a custom field "version-fixed" for storing a
> reference to the version in which a ticket has been fixed, assuming
> the user don't want to abuse the milestone field for such a thing,
> like everyone else does ;-)

No problem!

A version is *easily* envisioned as a work effort: You start at one 
point in time and finish it later, and it sure is a lot of work in between
(and the exact same thing is true for any other part of repo, *and* of the
repo itself). So in practice you would need some new rows in Assoc Type to 
declare the existence of such relationships in the system:

INSERT INTO WorkEffortAssocTypes (name) VALUES ('Repository relations'); -- 1
INSERT INTO WorkEffortAssocTypes (parent_id, name) VALUES (1, 'Fixed in');
INSERT INTO WorkEffortAssocTypes (parent_id, name) VALUES (1, 'Planned for');

These rows must be directly handled by a corresponding piece of middleware 
code, core or plugin. Often the child rows fixed/planned are represented
in code as a subclass of a repository relation class, which is in turn
is subclassed from a workeffortassoctype class (not always). Only this 
code knows what the relationships means in terms of fields and forms to 
display, what to do etc. It must "manually" check that these relations 
are only created so they point from/to a work effort with a valid type 
(this is the cost of a generic model). In the specific use case you 
request explained, the corresponding rows to check against might be:

INSERT INTO WorkEffortTypes (name) VALUES ('Repositories'); -- 1
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Subversion');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (2, 'Trunk');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (2, 'Branch');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (4, 'Release');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (5, 'Version');

The relation would then be created by inserting a row into the
WorkEffortAssocs table, pointing from one work unit to another, and
specifying the ID acquired from the previously shown insert as assoc_type
(the "fixed in" id). This code would leave a NULL in the assoc_rule 
column, since it is not relevant for this type of relationship at all.
from_eff_id contains the ID of a work effort declared with an effort_type
as ticket (not shown). to_eff_id contains the ID of a work effort 
declared as an effort_type from the list immediately above.

Q.E.D. ;-)

> Trac, at least not for tickets. At most, there's a date associated
> to the closing time of a ticket, but a ticket can be closed/reopened
> many times... And by the way, all changes to the tickets are
> versioned, something which is also not at all present in your
> simplistic model.

This information belongs in the relationship between a work unit and 
a state entity within the definition of a workflow, or some such, 
slightly depending on circumstances.

> The need for versioning of changes should ideally be taken into
> account automatically by the storage system. Maybe that's another
> hint that perhaps a relational database is not the best suited for
> the job. You should really have a look at the
> TighterSubversionIntegration and the WhySQLite pages for prior
> discussion on this topic.

I have made a note to re-read these pages in the near future.

For SQLite, I submit the following proof-of-concept:

CREATE TRIGGER delete_to_revision
  AFTER DELETE ON storage
    BEGIN
      INSERT INTO revisions
        (storage_id, column1, column2, ..)
      VALUES
        (OLD.rowid, OLD.column1, OLD.column2, 'delete', JULIANDAY('NOW'));
    END;

That will push a row to revision history when deleted. It will work in all
engines for full revisioning. Its performance implications have yet to be
determined across platforms, but sqlite does support ATTACH, so it could
be established as a revision.db alongside the main .db to reduce runtime
impact due to exclusive lock during revisioning. The approach is acceptable,
imho (if it passes criteria). Another optimization is of course to do the
same with custom fields, custom.db and have (support) 3 all together.

> There's much more to it than just the database, the database is just
> a persistence layer, but not at all the level where we place the
> logic in it. Take for example the permission model: before 0.10,
> permissions were all placed in the database, now starting with 0.11,
> you can have any kind of permission policy implemented by plugins,
> and the old db-based permission policy is just one among them, which
> could even be turned off.

And this type of flexibility is what makes Trac great, and the reason why I
did decide to deploy it in production ;-) All I want is more of it!

> things before committing to a model. Once again, I don't know yet
> the final form this will take, in the end this will be a simple,
> normalized, scalable model, I hope a solid foundation to build upon,
> so maybe even something that could satisfy your database design
> taste, who knows?

I hope, maybe, I managed to communicate at least some points about the
proposed model. I hope it helps.

Have a nice day.

Terje

-- 
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/trac-dev?hl=en.

Re: [Trac-dev] Refactoring the Trac data model - relational approach

Reply via email to