+1 Jay, for explaining. another +1 for explaining so well. yet another +1 for explaining in such detailed.
Heck, if you had written such a long post full of nonsense, you should have gotten +1. Since you wrote such a long post that is so useful and helpful to probably everyone, your post should really be enshrined in the sqlite wiki. On Sat, Oct 31, 2009 at 8:32 PM, Jay A. Kreibich <j...@kreibi.ch> wrote: > > On Sat, Oct 31, 2009 at 01:13:31AM -0400, mark m scratched on the wall: >> O.K. I think I am starting to get the idea. It is just so foreign for me >> to organize things this way. A master work history table for all cases >> almost seems confusing. It will just take a bit of adjustment for me >> to "trust" the database way of doing things. Text files organized in >> the way I described has always made it easy for me to figure things >> out when there was a problem. > > This is not uncommon. If I had to guess, I'd say you're an experienced > applications developer, but are somewhat new to databases. You're > applying your instincts and experience in designing and laying out > runtime-data structures and classes to your database design, but find > yourself stumbling over some of the minor conceptual differences in > this new environment. > > The good news is that this is normal, and nearly everyone makes these > mistakes. The great news is that I've found much of that experience > can be utilized and successfully applied to database design as soon > as you wrap your head around a few minor differences. > > The biggest stumbling block-- and the one you've hit head-on here-- > is to think of tables as *instances* of compound data structures. They > look a lot like arrays or dynamic lists, so this is an easy mistake. > > In your case, you need a bunch of records associated with some other > table row, so the instinct is to create a brand new table to fit that > need. You'll end up with a bunch of tables with the exact same > type-signature (column/type/name pattern), but that's how instances > of data structures or classes work-- this is exactly what you'd do > if that table was some fancy C++ class that offered a dynamic list > of elements. It also provides for a clear and direct route to get > from a main data record to a collection of related sub-members. > > Unfortunately, in the database world, that's the wrong concept and, > as others have pointed out, this will lead to no end of problems. > > Here is the first big rule, and it alone will get you pretty far: > > Tables are data structures and classes. > > If you would define a data structure or class to fit some specific > and well defined purpose, that's a table. The .sql file that > defines your tables should map to the .h file in your head. > > Don't think of tables themselves as multi-element data structures, > think of them as the data structures definitions. It just happens that > in the database environment, every instance of a particular data > structure (i.e. a row) happens to be managed by a global instance pool. > > In extension of that, the database world has no "contains a." All > your compound structures are built from "references a." And the > references that bind everything together are not pointers, but ID > values. > > The other big stumbling block is that most of these references are > "backwards". If you were building this as an application, you'd have > your main records, and each one of those would have a pointer off to > some dynamic list or array that held the log records. In the database > world, the main record simply exists with some type of unique > identifier. Rather than the main record pointing to the associated log > records, the log records point back and the main record they're > associated with. > > This tends to make application developers uncomfortable. If you had > a main record and wanted a list of all the log records associated > with it, you'd want to be able to de-reference some pointer or other > association directly to some container item, like an array or list of > log records. This is generally what leads to the desire to build > multiple tables of the same type, because tables look a lot like > instances of a container item. You're thinking of those tables as > *instances* of multi-element data structures, rather than a global > collection of all sub-instances of that type. > > That's pretty natural. After all, it would be an odd way to build an > application. While many environments have global lists of objects > (e.g. static class variables that hold a list of every instance of > that particular class), you rarely use those lists for building > general associations. In specific, if you were manipulating record X > and wanted a list of all the associated log records for X, you would > never consider putting a "I belong to data X" field into the log > record. This would require scanning the global list of log records > to get from a main record to all the associated log records. Not > only is that somewhat indirect, it sounds very inefficient. > > But this is exactly what you do in the database world. The main > difference being that this is exactly the kind of thing that > databases are very very good at-- both in terms of optimized searches > for specific records in a large collection (if properly indexed) as > well as powerful set manipulations that allow very complex > manipulation of large sets of data as single operations. > > This second point is actually more important than many people first > realize, and that's fine. But as you get better with databases > you'll start to think in doing operations to large sets of data > items, rather than direct manipulation of discrete items. That comes > later, however, and for now it isn't a big limitation to just ignore it. > > > > So... to make the jump from application development to database design, > I'd first start by more or less thinking things through the same way. > If you have years of experience designing data models and class trees, > there is no good reason to throw all that experience and understanding > out. Use it to lay out your block diagrams and datatypes, but with a > few specific changes: > > 1) Tables are your data structure definitions. If you would organize > something by defining a data-structure or class, that's a table in > your database. > > 2) Rather than pointers, you use Foreign Key ID values to bind > different data structures (tables) together. > > 3) Most of the "pointers" or references are backwards. Rather than > a master record holding pointers to sub-records, in a database it is > the sub-records that have the Foreign Keys that point back to the > master record. > > > > That will get you pretty far. I could come up with some more points, > but that would already be getting into some pretty find details. The > only other thing I might add is that there are some pretty standard > idioms in database design, like One-to-Many and Many-to-Many > relationships, that are worth a quick web-search and doing some quick > reading about. Their use and purpose should be fairly obvious to the > traditional developer, and provides more reinforcement in bridging > your traditional programming experience to the database world. > > > > I also realize that if there are any traditional database data > modelers out there (that are still reading), I know you're cringing > pretty hard right now. At some point, if you want to become very > very good at database design, you do need to uncouple your database > thinking from your programming language thinking. Personally I think > the bigger obstacle is "thinking in sets" but there are also a number > of subtle things in the database design where this is true. > > That said, most of us on this list don't need to get that far-- > especially most of us working with SQLite. Many of us are developers > that have some nagging data problem that has gotten a bit out of > hand, and we found this easy, fast, drop-in database to replace our > ugly hacks with text files and out-of-control file formats. I know > there are a few people out there doing some pretty heavy-lifting > work-- and I'm definitely not saying the folks around here aren't > "real" database programmers-- but I'm guessing a fair number of us are > SQLite users for secondary reasons (I'd like to hear if anyone thinks > I'm totally off on this one). For the majority of those people, a > few basic rules and a few standard design idioms is all we need to > address our needs and get ou with our "real" work, quite thankful > that SQLite is so easy to integrate and makes so many ugly data > management issues just disappear. I know that's where I am. > > > Anyways... I've gone on long enough. Good luck with your design. > Think a bit, ask good questions, and hopefully we can all see > a different point of view and learn to see something new. > > -j > > -- > Jay A. Kreibich < J A Y @ K R E I B I.C H > > > "Our opponent is an alien starship packed with atomic bombs. We have > a protractor." "I'll go home and see if I can scrounge up a ruler > and a piece of string." --from Anathem by Neal Stephenson > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Puneet Kishor http://www.punkish.org Carbon Model http://carbonmodel.org Charter Member, Open Source Geospatial Foundation http://www.osgeo.org Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor Nelson Institute, UW-Madison http://www.nelson.wisc.edu ----------------------------------------------------------------------- Assertions are politics; backing up assertions with evidence is science ======================================================================= Sent from Madison, WI, United States _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users