When I switched to Mac OS X (from Windows) earlier this year, I saved all of my personal email in an Excel spreadsheet. (It was a ridiculous process: I had to convert it from Outlook Express to Outlook to Access to Excel.) In this huge spreadsheet, each line is a single email message, with columns for "To", "From", "Subject", "Body", etc.

I now want to get the email into a MySQL database using mysqlimport. Mysqlimport can import comma- or tab-separated files, which Excel can generate. My problem arises because the body of each email has newlines after each paragraph (\r on Mac OS X), which is precisely the character that is used to separate records (messages).

Here's an example of what an Excel-generated, tab-separated file looks like: (It has the header and two records.)

To \t From \t Subject \t Body \r
[EMAIL PROTECTED] \t [EMAIL PROTECTED] \t Hello! \t Hi Richard, \r
How are you doing?] \r
I just wanted to let you know... \r
... \r
[EMAIL PROTECTED] \t [EMAIL PROTECTED] \t Thanks \t Mr. Miller, \r
Thanks for switching to Mac OS X earlier this year. \r
Sincerely, \r
Steve \r
.... \r


Because the message column has line breaks in it, when I import it into MySQL, the body only reads "Hi Richard," or "Mr. Miller," and excludes the rest of the message.

If I knew what the newline character was in Excel, I would replace it with some dummy character that I could expand later. If not, it looks like once I export to a tab-separated file, it's too late to distinguish between newlines in the message body and newlines that end each record.

Any suggestions?

Richard Miller


____________________
BYU Unix Users Group http://uug.byu.edu/ ___________________________________________________________________
List Info: http://uug.byu.edu/cgi-bin/mailman/listinfo/uug-list

Reply via email to