Michael Adams wrote:

So I had a situation where I needed to massage some data that I had procured as text files. Do some statistics and draw some graphs, that sort of thing. It turned out to be over 62000 data sets. But DataPilot
was the perfect tool to do what I needed.

Oh no it wasn't ;) It was the tool you were familiar with. A database is
the best tool for this (snuffle - nuthins perfect).

I beg to differ. Given the structure of the data and what I wanted to accomplish, the datapilot was the perfect tool. You could have used my problem as a textbook example of "How to Use a Datapilot". The *only* aspect of this that suggested the use of a database was the quantity of data.

Furthermore, once I had set up the Pivot table, I could just select the appropriate columns and immediately produce a graph. Using a database I would have had to export a results table into a graphing program like gnuplot.

Lastly, the fact that I was under time pressure makes my familiarity with the tool a *very* relevant factor.



I probably _could_ have worked it out by creating a database, but it would have taken me a whole lot longer because I would have spent considerable time just screwing around figuring out how to do it.

But the experience gained pays for itself.

How? I haven't had to do that kind of analysis since then, so I would still be waiting for that payoff.


Now if it had been 72K instead of 62K rows I would have had no choice.

Next time it might be.

Assuming there's a next time. It was for a term paper -- one time event.


You can create dictionary definitions of "spreadsheet" and "database app" that make them into completely different animals, but in real
life applications there is a considerable overlap.


Not when you get into relational databases - real world stuff. A
spreadsheet will handle single table type stuff sort of. It won't do it
as well as a database.

Think about this example - people and their email addresses.
You have a person, you have an email address, simple.
But then you have the same person that wants to contact you from work
and home. One person - Two email addresses.
He tells his wife and she signs up using the same home email address,
and her work address. Two people - three email addresses.
Their daughter signs up as well at home. Three people - three email
addresses.
With a flat database (spreadsheet) you have five entries.
dad - home
dad - dads work
mum - home
mum - mums work
grl - home

with a relational database you have two tables with three in each and
the crossreference info.

dad \|/ dads work
mum -|- home
grl /|\ mums work

Sorts and lookups are then done much faster as there are less to sort
through. Sort by person or sort by email. Look up the person then list
their emails.

Really simple example but just enough to explain my point (i hope).


I totally understand the concept of relational databases (and I'll see your First Normal Form and raise you to Third). The issue in my mind isn't so much the data itself or how much of it you have as it is what you intend to do with it. RDBs are the thing for storing, retrieving, and updating information. Spreadsheets are good for calculating and analyzing information. I was doing the latter on a single, static, set of data.

But if I was trying to track customer information, or financial transactions, or inventory, or anything else that is dynamic or anytime you need to break out the info into sub-tables (where you get into joins and unions) like you described, then of course you would use a RDBMS.

The point is to use the proper tool for the job. What I didn't mention in the first post (because I didn't think it relevant) was that I also used a couple of simple sed expressions to filter and pre-process the data. Why? I know how to use sed. I know how to use Calc. I know database theory but I have almost zero experience at it. And gnuplot I know about, but haven't messed with. Easy decision.

--

Rod

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to