On Sun, Jul 04, 2010 at 09:27:33PM +0200, Remy Blank wrote: > I'm sure your input would be highly appreciated. None of the current > developers is a DB expert either (AFAIK, I'm certainly not one, quite > the opposite), so any help is welcome.
Sounds fun, but to clarify: I'm just a hobbyist too, though I have a book collection on the topic, and love to play around when time permits, but that's about it. You get what you pay for, I guess! ;) > - Even if a complete migration requires a complete rewrite, is it > possible to migrate progressively from the current to the new structure? Tough question. My gut feeling is no, because I think it will add a large amount of work overall. This change will touch virtually all aspects of the system. We have to rethink a lot of the elementary units, layers and interfaces throughout, and continually evolving this in an active branch will be tough. IMO it deserves a completely new, isolated research codebase. The new model and core middleware architecture is laid out, and at some point the reusable parts of the existing codebase are ported based on decisions to be made along the way. The first alpha is built from that foundation, and it is marketed as a new product with a (partial) upgrade path possible (or we gather enough information to decide for gradual implementation to main code) > I don't believe in big, monolithic rewrites, as they are a sure way to > stall the project for a long time. [...] Without a doubt, this is true. But also consider that the existing system has been gold-plated for some time. Working with the old stuff, that has a very different approach and lots of quirks in general, may inhibit freedom or progress in the new design, because we have to support a lot of legacy hacks along the way. It will *not* be an easy task either way, and it is evident that a complete rewrite will take a long time to reach release. I think the relative simplicity of the feature set as compared to the codebase is an argument for not using the dual/integrated approach, as very much of the current code including some plugins will "rewrite itself" directly on top of the new model, and doubtless some code can be ported/reused with minimal changes, like wiki and other data processors (but I don't think it makes sense to evolve together). Ultimately the maintainers must make the call, of course. > - I have read a few times that although a normalized database schema > makes it very flexible and general, it also has a performance impact. > From a layman point of view, having to join multiple tables has to be > slower than not having to join tables, hasn't it? Again, I'm a noob in > DB technology, so don't laugh too much if the question is stupid. That's a great question, though much more complex than I bargained for ;-) Your concern is that of premature optimization, the root of all evil. Performance is measured in many ways, and it's not about raw I/O speed at all (any database will pale in comparison to the filesystem). When you express a query to the database, there is an optimizer that figures out the best way to answer. It considers tables, indexes, joins, the order of comparisons and all other aspects of your query and how it applies to the model. It then makes a plan for what parts of what tables must be loaded from disk to RAM in what order, and returns the result of the query in the most efficient way possible. It also has some *very* sophisticated caching mechanisms to stay fast at runtime, in case you were to ask similar questions again soon. This has been tweaked for ages, and to take advantage of it, you must follow the game rules for relational databases. Normalization is sort of "best practices", how to correctly model various types of information, so it stays intact and can be efficiently queried, cached and updated without side effects. The point with a normalized model is that you can ask *any sort* of question about the data you have, combining as many or as few elements as you need to do the job at hand, and still get the right answer in the most effective way. IMPORTANT NOTE: The following sections may not be entirely accurate as to the exact inner workings of a database system, or in any other regard. So option one is to send a query to the database, asking for the information you need in a well-defined and well-understood language, SQL. The (well- tested) optimizer is technically speaking a million wizards who joyously compute answers with all sorts of deep magic, and give it right back to you with a huge smile. You can easily change the question, and ask other things about the same data, adding or removing elements, criteria, comparing various aspects etc. The wizards will whip the answer right up, in the most effective way possible, as long as you have a normalized model and ask a sane question. Option two is to ignore this alltogether, and pretend the database is a filesystem without wizards. You tell them to cast an ancient fscanf()-spell to load a giant bunch of key-value pairs - one wizard casts the spell, then they all get depressed and go to the basement to smoke a ton of heroin you have bought for them (with your own time maintaining this model). Ancient spells still work, so you get your list of key-value pairs, and proceed to JOIN it *manually* in a *nonstandard* way *at runtime* (by instantiating objects, appending to lists or some other code that will be very hard, since you are reinventing SQL). Now you need some code to enforce relationships between entities, to denote how a milestone object relates to specific ticket objects and all sorts of other business rules to handle mastertickets or whatever. By now, in an early prototype of the system, we are approaching the stage where we can answer very simple questions about the data using a very complex process, but *not* the other way around! We have successfully rescued a single brain-damaged wizard from the heroin hell downstairs, up to the middleware. When you want to ask a slightly different question later, you are in deep shit because the other 999999 wizards have all become highly addicted hippies and by now have taken up ritual murder of cute kittens as their primary hobby, casting an occasional fscanf() for their dose. The one wizard you've rescued has no capacity to learn, so you'll spend ages getting just one more of them clean and re-purposed for work. When you do, he will inevitably be brain-damaged, and you have to spend eternity checking if he's doing his job or needs fixing. First you have to design this system, then implement, test and maintain it, and at several points later invent various caching layers or other optimizations that the wizards knew perfectly well before you drugged them. All the shit happens simply because smoking heroin turns out to have a detrimental effect on wizards! Hope That Helps. It is true, there are special cases when a denormalized model is best, but that's not an argument for giving all the wizards heroin when you first meet them. It's much better to give one of them a little bit of cocaine to get by some particularily mundane part of your model (say a consolidated Work Efforts or the custom field system that both *must* largely be denormalized and thus handled in code). Without fail, a denormalized part in the model *requires* that you train and babysit some drugged-down hippie fucker crawling around in the wastelands of your code. In exchange for that, you can have flexibility, speed, time to market or other wins. And so, I think it will be easier to rid of all the addicted wizards than to re-integrate them with society in a 12-step program, and normalization is an argument *for* performance on all levels, not against it ;-) As possible evidence for some aspects of all of the above, I submit this link and a two-beer bet that straight wizards would outperform it: http://FIXME.edgewall.org/timeline?daysback=30 Raise, anyone? ;-) Terje -- You received this message because you are subscribed to the Google Groups "Trac Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/trac-dev?hl=en.
