On Sat, Oct 31, 2009 at 01:13:31AM -0400, mark m scratched on the wall: > O.K. I think I am starting to get the idea. It is just so foreign for me > to organize things this way. A master work history table for all cases > almost seems confusing. It will just take a bit of adjustment for me > to "trust" the database way of doing things. Text files organized in > the way I described has always made it easy for me to figure things > out when there was a problem.
This is not uncommon. If I had to guess, I'd say you're an experienced applications developer, but are somewhat new to databases. You're applying your instincts and experience in designing and laying out runtime-data structures and classes to your database design, but find yourself stumbling over some of the minor conceptual differences in this new environment. The good news is that this is normal, and nearly everyone makes these mistakes. The great news is that I've found much of that experience can be utilized and successfully applied to database design as soon as you wrap your head around a few minor differences. The biggest stumbling block-- and the one you've hit head-on here-- is to think of tables as *instances* of compound data structures. They look a lot like arrays or dynamic lists, so this is an easy mistake. In your case, you need a bunch of records associated with some other table row, so the instinct is to create a brand new table to fit that need. You'll end up with a bunch of tables with the exact same type-signature (column/type/name pattern), but that's how instances of data structures or classes work-- this is exactly what you'd do if that table was some fancy C++ class that offered a dynamic list of elements. It also provides for a clear and direct route to get from a main data record to a collection of related sub-members. Unfortunately, in the database world, that's the wrong concept and, as others have pointed out, this will lead to no end of problems. Here is the first big rule, and it alone will get you pretty far: Tables are data structures and classes. If you would define a data structure or class to fit some specific and well defined purpose, that's a table. The .sql file that defines your tables should map to the .h file in your head. Don't think of tables themselves as multi-element data structures, think of them as the data structures definitions. It just happens that in the database environment, every instance of a particular data structure (i.e. a row) happens to be managed by a global instance pool. In extension of that, the database world has no "contains a." All your compound structures are built from "references a." And the references that bind everything together are not pointers, but ID values. The other big stumbling block is that most of these references are "backwards". If you were building this as an application, you'd have your main records, and each one of those would have a pointer off to some dynamic list or array that held the log records. In the database world, the main record simply exists with some type of unique identifier. Rather than the main record pointing to the associated log records, the log records point back and the main record they're associated with. This tends to make application developers uncomfortable. If you had a main record and wanted a list of all the log records associated with it, you'd want to be able to de-reference some pointer or other association directly to some container item, like an array or list of log records. This is generally what leads to the desire to build multiple tables of the same type, because tables look a lot like instances of a container item. You're thinking of those tables as *instances* of multi-element data structures, rather than a global collection of all sub-instances of that type. That's pretty natural. After all, it would be an odd way to build an application. While many environments have global lists of objects (e.g. static class variables that hold a list of every instance of that particular class), you rarely use those lists for building general associations. In specific, if you were manipulating record X and wanted a list of all the associated log records for X, you would never consider putting a "I belong to data X" field into the log record. This would require scanning the global list of log records to get from a main record to all the associated log records. Not only is that somewhat indirect, it sounds very inefficient. But this is exactly what you do in the database world. The main difference being that this is exactly the kind of thing that databases are very very good at-- both in terms of optimized searches for specific records in a large collection (if properly indexed) as well as powerful set manipulations that allow very complex manipulation of large sets of data as single operations. This second point is actually more important than many people first realize, and that's fine. But as you get better with databases you'll start to think in doing operations to large sets of data items, rather than direct manipulation of discrete items. That comes later, however, and for now it isn't a big limitation to just ignore it. So... to make the jump from application development to database design, I'd first start by more or less thinking things through the same way. If you have years of experience designing data models and class trees, there is no good reason to throw all that experience and understanding out. Use it to lay out your block diagrams and datatypes, but with a few specific changes: 1) Tables are your data structure definitions. If you would organize something by defining a data-structure or class, that's a table in your database. 2) Rather than pointers, you use Foreign Key ID values to bind different data structures (tables) together. 3) Most of the "pointers" or references are backwards. Rather than a master record holding pointers to sub-records, in a database it is the sub-records that have the Foreign Keys that point back to the master record. That will get you pretty far. I could come up with some more points, but that would already be getting into some pretty find details. The only other thing I might add is that there are some pretty standard idioms in database design, like One-to-Many and Many-to-Many relationships, that are worth a quick web-search and doing some quick reading about. Their use and purpose should be fairly obvious to the traditional developer, and provides more reinforcement in bridging your traditional programming experience to the database world. I also realize that if there are any traditional database data modelers out there (that are still reading), I know you're cringing pretty hard right now. At some point, if you want to become very very good at database design, you do need to uncouple your database thinking from your programming language thinking. Personally I think the bigger obstacle is "thinking in sets" but there are also a number of subtle things in the database design where this is true. That said, most of us on this list don't need to get that far-- especially most of us working with SQLite. Many of us are developers that have some nagging data problem that has gotten a bit out of hand, and we found this easy, fast, drop-in database to replace our ugly hacks with text files and out-of-control file formats. I know there are a few people out there doing some pretty heavy-lifting work-- and I'm definitely not saying the folks around here aren't "real" database programmers-- but I'm guessing a fair number of us are SQLite users for secondary reasons (I'd like to hear if anyone thinks I'm totally off on this one). For the majority of those people, a few basic rules and a few standard design idioms is all we need to address our needs and get ou with our "real" work, quite thankful that SQLite is so easy to integrate and makes so many ugly data management issues just disappear. I know that's where I am. Anyways... I've gone on long enough. Good luck with your design. Think a bit, ask good questions, and hopefully we can all see a different point of view and learn to see something new. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Our opponent is an alien starship packed with atomic bombs. We have a protractor." "I'll go home and see if I can scrounge up a ruler and a piece of string." --from Anathem by Neal Stephenson _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users