Re: [libreoffice-users] Currently Using Spreadsheet for Personal Project - Thinking About Database

Paul Mon, 11 Aug 2014 15:28:09 -0700

Hi Joel,

Some comments on your diagram, in the order I thought of them, not of
importance:

1) I tend to use integers for primary keys. Smallints only go into the
thousands, and most of the stuff I have worked on has needed to be able
to scale well beyond that. Even things that seem small to start with
have a habit of growing unexpectedly, and you don't want to have to
change your database design down the road for something small like
that.

Planning ahead really is the name of the game with databases, as
half the reason we use them to begin with is to handle things that
have grown beyond our simple spreadsheets, or that are expected to
become unwieldy in the future. The whole point of a good design is
that it should handle this sort of thing. And the extra storage space
really shouldn't be an issue. For example, a quick estimate
suggests that using integer instead of smallint for keys in the diagram
you gave will only use an extra 1 mb of storage space if you go all the
way up to the limit of the smallint range, and beyond that it is
academic as you would be *required* to use int instead of smallint.

But for a home project you're unlikely to exceed that number, so this
is fine. It's just worth being aware of this when designing it, so that
you know the choices you have made, and why you have made them.

I would make them unsigned smallints, though. Not really a good idea to
use negative numbers for keys, and so making them signed just wastes
half their range.

2) I wouldn't put a Read flag in the Book table. You have a Status
table with a ReadStatus, so this would be duplication of data.

3) Same with the SeriesComplete flag in the Series table. You can get
that information by doing a query that would check if all the books in
the series are in the read status.

It's usually much better practice to generate this sort of information
as you need it from the relevant pieces of information in the database
(assuming you have all the required pieces in the database), than to
store this information a second time, which is just duplication. Even
if the space wasted isn't important, it just leads to the possibility
of the data being inconsistant.

The only time one stores such things is when the actual calculation to
work this out from the pieces of data takes so long that it is actually
necessary to store calculated results to save time in the application.

4) Same with AuthorID in the Series table. Each book will have an
author, so you don't need a series author. That way if the books in the
series are written by different authors everything still works. This
way, who would the series author be, and how would that be any
different from the authors of the individual books?

Note that you can still show this information in the frontend when
viewing the series. You can make a query to check if the authors of all
the books in the series are the same, and if so show that author, but
if they are different just show "various" or something.

Also, when entering all the books in a series, you can still have the
frontend ask you for the series information first, including author,
and if you fill it in then it will prepopulate the author field when
adding each book.

Of course, maybe you actually do have a use for a series author as
separate to the authors of the individual books, so maybe this does
make sense to you, it just doesn't seem right to me.

5) Some people want author names to be a single field, which is simpler
for some things, and avoids hassles with authors with multiple names,
and other people want them to be a name and surname field, which does
mean you can do things like order by surname or order by first name. And
avoids the problem of you sometimes entering the name surname first and
sometimes first name first.

With the name split into parts it is easier to check if the author
exists by both name and surname when entering new authors, so even if
you mistype one of them the author will still be found if they are
already in the database, but this can still be done if the name is all
in one field, it is just harder. You can also still sort the author
list by surname and by first name, it is just harder if both are in one
field.

The choice is up to you.

6) I would make the ReadStatus slightly larger, say text(30).
If you use varchar there is no wasted space if some of the entries have
less text, so you could even make this varchar(100) or something. Or
give it a short code and a longer description. Something like "could not
finish" is already too long for 15 characters. Sooner or later you're
going to want to add a Status that is longer than the 15...

7) The Status table is incorrect. The way you have it you could have
multiple statuses for a book, and each one would be a hand typed status
description. One time you might type "reading", and another time
you might mistype "raeding". Instead, you should change the BookID to a
StatusID in the Status table, and add a StatusID to the Book table. The
PageNum field would also need to move to the Book table.

This way you would put all the possible statuses into the Status table,
and each book would have a link to that table, showing which of those
possible statuses the book currently is in.

Uh...

I think that's all for now. :)

Otherwise it looks good. You're almost there.

Paul

On Sun, 10 Aug 2014 18:48:21 -0700
Joel Madero <jmadero....@gmail.com> wrote:

> I decided to do the smart thing and diagram it out - the diagram
> isn't perfect (looks wise) but I hope that I can get a +1 before
> actually making the databases.
> 
> Note: I decided to put rankings with date read table, this way I can 
> have different rankings for the same book (for each instance that I
> read it, some books seem better - or worse - the second time
> around) :)
> 
> Thoughts appreciated, a million thanks for all the advice given so
> far.
> 
> Link: 
> https://drive.google.com/file/d/0B2kdRhc960qdZzJZenR1Qno2LWM/edit?usp=sharing
> 
> 
> Best,
> Joel
> 
> 
> 
> On 08/10/2014 01:35 PM, Paul wrote:
> >
> >
> > On Sun, 10 Aug 2014 21:30:18 +0100
> > Mark Bourne <libreoffice-ml.mbou...@spamgourmet.com> wrote:
> >
> >> It looks like each book should only have one of the "ReadStatus"
> >> flags set, so I'd make that an enum field on the BookInformation
> >> table, with possible values of "Not Read", "Reading" and "Read".
> >> You can set the default value for the field to "Not Read" so a new
> >> record will be set to that status if no value is specified for
> >> that field.
> > That's essentially the same idea as the status tables I was
> > suggesting. In my experience we've always used status tables, so I
> > would suggest those. Partly because I'm not familiar with database
> > support for enums. How well supported and widely supported is that
> > by the most common databases?
> >
> >> "NumberOfBooksByAuthor" and "ReadBooksByAuthor" are not needed on
> >> the "AuthorInfo" table - you can get those by querying the
> >> database. I may have the syntax slightly wrong here, but along the
> >> lines of: SELECT `ai`.`AuthorID`, `ai`.`AuthorName`,
> >> COUNT(`bi`.`BookID`) from `AuthorInfo` `ai` LEFT JOIN
> >> `BookInformation` `bi` ON `bi`.`AuthorID` = `ai`.`AuthorID` GROUP
> >> BY `ai`.`AuthorID` should give the number of books by each author.
> >> Add: AND `bi`.`ReadStatus` = "Read"
> >> to the ON condition and you can get the number of read books by
> >> each author.
> >>
> >> That's the kind of thing a database enables you to do much more
> >> easily than with a spreadsheet ;o)
> >>
> >> Mark.
> >>
> >>
> >> Joel Madero wrote:
> >>> Hi All,
> >>>
> >>> So I went back to planning stage. Link to what I think might work
> >>> - hoping to get the planning stage done today so I can start
> >>> actually putting together the db - I have 3 days off so now's a
> >>> good time for me to get the basic structure together :) Thanks in
> >>> advance!
> >>>
> >>> https://drive.google.com/file/d/0B2kdRhc960qdbGJIQ1M3NWtrdmc/edit?usp=sharing
> >>>
> >>>
> >>> Best,
> >>> Joel
> >
> 
> 

-- 
To unsubscribe e-mail to: users+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Re: [libreoffice-users] Currently Using Spreadsheet for Personal Project - Thinking About Database

Reply via email to