Michael Adams wrote:
So I had a situation where I needed to massage some data that I had
procured as text files. Do some statistics and draw some graphs, that
sort of thing. It turned out to be over 62000 data sets. But DataPilot
was the perfect tool to do what I needed.
Oh no it wasn't ;) It was the tool you were familiar with. A database is
the best tool for this (snuffle - nuthins perfect).
I beg to differ. Given the structure of the data and what I wanted to
accomplish, the datapilot was the perfect tool. You could have used my
problem as a textbook example of "How to Use a Datapilot". The *only*
aspect of this that suggested the use of a database was the quantity of
data.
Furthermore, once I had set up the Pivot table, I could just select the
appropriate columns and immediately produce a graph. Using a database I
would have had to export a results table into a graphing program like
gnuplot.
Lastly, the fact that I was under time pressure makes my familiarity
with the tool a *very* relevant factor.
I probably _could_ have worked it out by creating a database, but it
would have taken me a whole lot longer because I would have spent
considerable time just screwing around figuring out how to do it.
But the experience gained pays for itself.
How? I haven't had to do that kind of analysis since then, so I would
still be waiting for that payoff.
Now if it had been 72K instead of 62K rows I would have had no choice.
Next time it might be.
Assuming there's a next time. It was for a term paper -- one time event.
You can create dictionary definitions of "spreadsheet" and "database
app" that make them into completely different animals, but in real
life applications there is a considerable overlap.
Not when you get into relational databases - real world stuff. A
spreadsheet will handle single table type stuff sort of. It won't do it
as well as a database.
Think about this example - people and their email addresses.
You have a person, you have an email address, simple.
But then you have the same person that wants to contact you from work
and home. One person - Two email addresses.
He tells his wife and she signs up using the same home email address,
and her work address. Two people - three email addresses.
Their daughter signs up as well at home. Three people - three email
addresses.
With a flat database (spreadsheet) you have five entries.
dad - home
dad - dads work
mum - home
mum - mums work
grl - home
with a relational database you have two tables with three in each and
the crossreference info.
dad \|/ dads work
mum -|- home
grl /|\ mums work
Sorts and lookups are then done much faster as there are less to sort
through. Sort by person or sort by email. Look up the person then list
their emails.
Really simple example but just enough to explain my point (i hope).
I totally understand the concept of relational databases (and I'll see
your First Normal Form and raise you to Third). The issue in my mind
isn't so much the data itself or how much of it you have as it is what
you intend to do with it. RDBs are the thing for storing, retrieving,
and updating information. Spreadsheets are good for calculating and
analyzing information. I was doing the latter on a single, static, set
of data.
But if I was trying to track customer information, or financial
transactions, or inventory, or anything else that is dynamic or anytime
you need to break out the info into sub-tables (where you get into joins
and unions) like you described, then of course you would use a RDBMS.
The point is to use the proper tool for the job. What I didn't mention
in the first post (because I didn't think it relevant) was that I also
used a couple of simple sed expressions to filter and pre-process the
data. Why? I know how to use sed. I know how to use Calc. I know
database theory but I have almost zero experience at it. And gnuplot I
know about, but haven't messed with. Easy decision.
--
Rod
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]