Re: [sqlite] Re: db design mixing different entities in the same table

Andrew Finkenstadt Thu, 31 May 2007 22:22:22 -0700

On 6/1/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote:

On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote:
>
> On 5/31/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote:
> > On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote:
> > >
> > > I realize that I didn't explain the problem well enough (thus is the
> > > peril of being too close to it for a long time).
> > >
> > > I can have an (org <--> org) relationship or an (org <--> person)
> > > relationship as well as a (person <--> person) relationship. To make
> > > matters worse, a person or an org can have multiple relationships
> > > based a role. Here is an example.
> > >
> > > John is the Director of Forestry Dept.
> > > Jane is the Head of the Economics Dept.
> > > Joe is a Scientist at the Forestry Dept.
> > >
> > > rel1: John (Director: Forestry) has a joint project with Jane (Head:
>
> > > Economics)
> > > rel2: John (Director: Forestry) is the boss of Joe (Scientist:
> Forestry)
> > > rel3: John (Person: null) is a friend of Joe (Person: null)
> > > rel4: John (Person: null) goes to the same church as Jane (Person:
> null)
> > >
> > > and so on.
> > >
> > > Well, I really have three "vectors" per "node" -- person, role,
> > > organization. In each node, either person or organization can be
> null
> > > (obviously, role will never be null because every node or entity has
> a
> > > purpose). Each of these vectors can be defined by a finite number of
> > > meaningful properties -- for example, a person has a name, nickname,
>
> > > address, contact information, and so on. Even the role can have a
> > > level of importance -- a "secretary" at the Forestry department is
> not
> > > as important as the "Director," so the role can have a weight, and
> so
> > > on.
> > >
> > > This is the problem that I have to model. I am trying to devise a
> > > storage structure for something like this in SQLite.
> > >
> > >
> >
> >
> > Been there, done that.  About 15 years ago.
>
> oh, how I hate you ;-)
>
> > We called the model CAPPR:  Company, Address, Person, Phone, and
> Roles.
> ..
> >
> > So bottom line, you're going to need your ENTITY tables, and your
> > RELATIONSHIP tables, and your ROLE tables.  You may also need "lookup
> > tables" to help enforce data cleanliness by reducing spelling errors
> or
> > similar issues.
> ..
>
> So, you are telling me that you did it, but I really want to know
> "how" to do it. I am sure this is not a unique problem I am trying to
> solve. I just am getting all knotted up trying to figure out a sane db
> structure to model this within so I can put data in and take data out
> without major contortions and yet have it be flexible and scalable for
> different kinds of relationships.
>
> So, the real CAPPR will be if you give me a few db design guidelines
> here.
>
> ;-)
>
> --
> Puneet Kishor http://punkish.eidesis.org/
> Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
> Open Source Geospatial Foundation http://www.osgeo.org/education/
>

The database was rather simple in pictures, but complex by the time you
added everything in to help the database help your program stay out of
trouble.

We chose to identify a person by various sub-fields, and since we were not
internationalization-aware, we didn't have to have our innocence sullied by
the different positions of family names, given names, surnames, and
alternate names based on who you were in relationship to.  (Japanese and
asian language names, germanic, russian, spanish, etc.)  So our sub-fields
were honorific (Sir, Don, The Honorable), title (Mr. Mrs. Dr. Reverend,
etc), first name, middle name, last name, and something we could never
decide on a name for:  Post-name (the III, Junior, etc), and the degreed
position ( Ph.D., M.S.C.A, etc).

Data entry of this name could be by-field (preferred) or free text, which
the software then parsed into the separate fields for correction, and easy
searching later.  A little bit of human effort here helped increase the
utility of their contact management activities later.

We identified address the same way the post office does: street number,
pre-direction, street name, street type, post-direction, suite/apartment
qualififier, suite/apartment number, city, state, zip, zip+4.  The software
used address correction software certified for bulk mailing purposes with
the USPS.

Company name was almost the simplest.  There was a freetext data entry
field, and then the actual database field.  Abbreviations in the freetext
were expanded to the full name in the database field.

Phone numbers had country codes, area codes, telephone numbers,
extensions, passcodes, and special dialing instructions.

Roles were multi-level.  Roles were identified by a unique ID number, and
then split into appropriateness (which relationship it could describe
[person-company, company-address, person-address, person-phone,
company-phone, address-phone), the directionality description (left to
right, right to left), and then into categories.

Next, since there were 4 entity tables, there were 6 relationship tables
whose foreign keys were the corresponding entries in the entity tables & the
role table, plus a free-text field for notes or comments.

Each column whose content was restricted to a list of standardized values
(the 50 states + canadian provinces, armed force theatres, territories, etc;
the list of titles or post-fix names; the directionality of streets, etc)
had a lookup table against which the column was validated.  We found that
these lookup tables would feed directly into the user interface as a pop-up
list.  Each lookup table had both a key entry, and a display value, and an
on-screen order so that more frequently used options were near the top of
the list.

Each and every table contained a set of columns describing who entered the
data, when it was entered, and when it was last modified (and by whom).

Each and every table had a journal table recording changes to the table by
recording previous-value tuples.

And last, but extremely important for debugging and auditability, there
was a transaction table which recorded the LOGICAL operations performed
against the databases.  You could, in theory, replay the transaction table
against an empty database and your data would be re-created correctly, step
by step.  We did, in practice, find bugs by replaying reported problems and
going "ah ha!  They really did want to enter in a 25-letter last name."

So, while I haven't actually included SQL statements that build the
database, the description above should be sufficient to do so.

Hope that helps.



The user interface for these tables used the "card" metaphor.  You could
display a "COMPANY CARD" and it would display the company name, and then the
people & their addresses and phone numbers, the company address, and the
company phone numbers.  Selecting a person you could "pivot" the data,
switching to the "PERSON CARD", which would show the person name, and then
the companies they were associated to, with company addresses and phone
numbrs, and then the person's addresses and phone numbers.

Similarly, you could enter in a phone number, and the software would find
all people or companies associated with the phone number;  valuable when
CALLER ID was still new, and CALLER ID WITH NAME was a dream in someone's
eyes.

Re: [sqlite] Re: db design mixing different entities in the same table

Reply via email to