Re: [PATCHES] [HACKERS] multiline CSV fields

2004-12-03 Thread Bruce Momjian
Patch applied. Thanks. --- Andrew Dunstan wrote: > > > I wrote: > > > > > If it bothers you that much. I'd make a flag, cleared at the start of > > each COPY, and then where we test for CR or LF in CopyAttributeOutCSV,

Re: [HACKERS] multiline CSV fields

2004-12-02 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: + if (!embedded_line_warning && (c == '\n' || c == '\r') ) + { + embedded_line_warning = true; + elog(WARNING, + "CSV fields with embedded linefeed or carriage return " + "characters might not be able to be reimport

Re: [HACKERS] multiline CSV fields

2004-12-02 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes: > + if (!embedded_line_warning && (c == '\n' || c == '\r') ) > + { > + embedded_line_warning = true; > + elog(WARNING, > + "CSV fields with embedded linefeed or c

Re: [PATCHES] [HACKERS] multiline CSV fields

2004-12-02 Thread Andrew Dunstan
I wrote: If it bothers you that much. I'd make a flag, cleared at the start of each COPY, and then where we test for CR or LF in CopyAttributeOutCSV, if the flag is not set then set it and issue the warning. I didn't realise until Bruce told me just now that I was on the hook for this. I guess

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Bruce Momjian
Andrew Dunstan wrote: > > > Bruce Momjian wrote: > > >I am wondering if one good solution would be to pre-process the input > >stream in copy.c to convert newline to \n and carriage return to \r and > >double data backslashes and tell copy.c to interpret those like it does > >for normal text COP

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Andrew Dunstan
Bruce Momjian wrote: I am wondering if one good solution would be to pre-process the input stream in copy.c to convert newline to \n and carriage return to \r and double data backslashes and tell copy.c to interpret those like it does for normal text COPY files. That way, the changes to copy.c mi

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Bruce Momjian
Andrew Dunstan wrote: > > > Greg Stark wrote: > > >Personally I find the current CSV support inadequate. It seems pointless to > >support CSV if it can't load data exported from Excel, which seems like the > >main use case. > > > > > > OK, I'm starting to get mildly annoyed now. We have iden

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Greg Stark
Andrew Dunstan <[EMAIL PROTECTED]> writes: > FWIW, I don't make a habit of using multiline fields in my spreadsheets - and > some users I have spoken to aren't even aware that you can have them at all. Unfortunately I don't get a choice. I offer a field on the web site where users can upload an

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Andrew Dunstan
[EMAIL PROTECTED] wrote: I am normally more of a lurker on these lists, but I thought you had better know that when we developed CSV import/export for an application at my last company we discovered that Excel can't always even read the CSV that _it_ has output! (With embedded newlines a part

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Ben . Young
> > > Greg Stark wrote: > > >Personally I find the current CSV support inadequate. It seems pointless to > >support CSV if it can't load data exported from Excel, which seems like the > >main use case. > > > > > > OK, I'm starting to get mildly annoyed now. We have identified one > failure

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Andrew Dunstan
Greg Stark wrote: Personally I find the current CSV support inadequate. It seems pointless to support CSV if it can't load data exported from Excel, which seems like the main use case. OK, I'm starting to get mildly annoyed now. We have identified one failure case connected with multiline fie

Re: [HACKERS] multiline CSV fields

2004-11-30 Thread Kris Jurka
On Tue, 30 Nov 2004, Greg Stark wrote: > > Andrew Dunstan <[EMAIL PROTECTED]> writes: > > > The advantage of having it in COPY is that it can be done serverside > > direct from the file system. For massive bulk loads that might be a > > plus, although I don't know what the protocol+socket over

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Greg Stark
Andrew Dunstan <[EMAIL PROTECTED]> writes: > The advantage of having it in COPY is that it can be done serverside direct > from the file system. For massive bulk loads that might be a plus, although I > don't know what the protocol+socket overhead is. Actually even if you use client-side COPY i

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Andrew Dunstan
Tom Lane wrote: Kris Jurka <[EMAIL PROTECTED]> writes: Endlessly extending the COPY command doesn't seem like a winning proposition to me and I think if we aren't comfortable telling every user to write a script to pre/post-process the data we should instead provide a bulk loader/unloader th

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Bruce Momjian
Tom Lane wrote: > Kris Jurka <[EMAIL PROTECTED]> writes: > > Endlessly extending the COPY command doesn't seem like a winning > > proposition to me and I think if we aren't comfortable telling every user > > to write a script to pre/post-process the data we should instead provide a > > bulk load

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Tom Lane
Kris Jurka <[EMAIL PROTECTED]> writes: > Endlessly extending the COPY command doesn't seem like a winning > proposition to me and I think if we aren't comfortable telling every user > to write a script to pre/post-process the data we should instead provide a > bulk loader/unloader that transform

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Bruce Momjian
Kris Jurka wrote: > > > On Mon, 29 Nov 2004, Andrew Dunstan wrote: > > > Longer term I'd like to be able to have a command parameter that > > specifies certain fields as multiline and for those relax the line end > > matching restriction (and for others forbid multiline altogether). That > >

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Kris Jurka
On Mon, 29 Nov 2004, Andrew Dunstan wrote: > Longer term I'd like to be able to have a command parameter that > specifies certain fields as multiline and for those relax the line end > matching restriction (and for others forbid multiline altogether). That > would be a TODO for 8.1 though, al

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > Also, can you explain why we can't read across a newline to the next > quote? Is it a problem with the way our code is structured or is it a > logical problem? It's a structural issue in the sense that we separate the act of dividing the input into rows

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Andrew Dunstan
Bruce Momjian wrote: Also, can you explain why we can't read across a newline to the next quote? Is it a problem with the way our code is structured or is it a logical problem? Someone mentioned multibyte encodings but I don't understand how that applies here. In a CSV file, each line is a re

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Andrew Dunstan
Bruce Momjian wrote: Andrew Dunstan wrote: OK, then should we disallow dumping out data in CVS format that we can't load? Seems like the least we should do for 8.0. As Tom rightly points out, having data make the round trip was not the goal of the exercise. Excel, for example, has no

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Bruce Momjian
Bruce Momjian wrote: > Andrew Dunstan wrote: > > >OK, then should we disallow dumping out data in CVS format that we can't > > >load? Seems like the least we should do for 8.0. > > > > > > > > > > > > > As Tom rightly points out, having data make the round trip was not the > > goal of the exer

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Bruce Momjian
Andrew Dunstan wrote: > >OK, then should we disallow dumping out data in CVS format that we can't > >load? Seems like the least we should do for 8.0. > > > > > > > > As Tom rightly points out, having data make the round trip was not the > goal of the exercise. Excel, for example, has no troubl

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Andrew Dunstan
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian <[EMAIL PROTECTED]> writes: Tom Lane wrote: Which we do not have, because pg_dump doesn't use CSV. I do not think this is a must-fix, especially not if the proposed fix introduces inconsistencies elsewhere. Sure, pg_dump d

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Bruce Momjian
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> Which we do not have, because pg_dump doesn't use CSV. I do not think > >> this is a must-fix, especially not if the proposed fix introduces > >> inconsistencies elsewhere. > > > Sure, pg_dump doesn't use it but

Re: [HACKERS] multiline CSV fields

2004-11-29 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Which we do not have, because pg_dump doesn't use CSV. I do not think >> this is a must-fix, especially not if the proposed fix introduces >> inconsistencies elsewhere. > Sure, pg_dump doesn't use it but COPY should be able to load an

Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Andrew Dunstan
Bruce Momjian said: > Tom Lane wrote: >> Bruce Momjian <[EMAIL PROTECTED]> writes: >> > OK, what solutions do we have for this? Not being able to load >> > dumped data is a serious bug. >> >> Which we do not have, because pg_dump doesn't use CSV. I do not think >> this is a must-fix, especially n

Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Bruce Momjian
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > OK, what solutions do we have for this? Not being able to load dumped > > data is a serious bug. > > Which we do not have, because pg_dump doesn't use CSV. I do not think > this is a must-fix, especially not if the proposed fix intr

Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > OK, what solutions do we have for this? Not being able to load dumped > data is a serious bug. Which we do not have, because pg_dump doesn't use CSV. I do not think this is a must-fix, especially not if the proposed fix introduces inconsistencies elsew

Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Bruce Momjian
OK, what solutions do we have for this? Not being able to load dumped data is a serious bug. I have added this to the open items list: * fix COPY CSV with \r,\n in data My feeling is that if we are in a quoted string we just process whatever characters we find, even passing through an

Re: [HACKERS] multiline CSV fields

2004-11-12 Thread Patrick B Kelly
On Nov 12, 2004, at 12:20 AM, Tom Lane wrote: Patrick B Kelly <[EMAIL PROTECTED]> writes: I may not be explaining myself well or I may fundamentally misunderstand how copy works. Well, you're definitely ignoring the character-set-conversion issue. I was not trying to ignore the character set and en

Re: [HACKERS] multiline CSV fields

2004-11-12 Thread Andrew Dunstan
This example should fail on data line 2 or 3 on any platform, regardless of the platform's line-end convention, although I haven't tested on Windows. cheers andrew [EMAIL PROTECTED] inst]$ bin/psql -e -f csverr.sql ; od -c /tmp/csverrtest.csv create table csverrtest (a int, b text, c int); CRE

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Tom Lane
Patrick B Kelly <[EMAIL PROTECTED]> writes: > I may not be explaining myself well or I may fundamentally > misunderstand how copy works. Well, you're definitely ignoring the character-set-conversion issue. regards, tom lane ---(end of broadcast)--

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Bruce Momjian
Can I see an example of such a failure line? --- Andrew Dunstan wrote: > > Darcy Buskermolen has drawn my attention to unfortunate behaviour of > COPY CSV with fields containing embedded line end chars if the embedded > s

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Patrick B Kelly
On Nov 11, 2004, at 10:07 PM, Andrew Dunstan wrote: Patrick B Kelly wrote: My suggestion is to simply have CopyReadLine recognize these two states (in-field and out-of-field) and execute the current logic only while in the second state. It would not be too hard but as you mentioned it is non-t

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Andrew Dunstan
Patrick B Kelly wrote: My suggestion is to simply have CopyReadLine recognize these two states (in-field and out-of-field) and execute the current logic only while in the second state. It would not be too hard but as you mentioned it is non-trivial. We don't know what state we expect the end

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Patrick B Kelly
On Nov 11, 2004, at 6:16 PM, Tom Lane wrote: Patrick B Kelly <[EMAIL PROTECTED]> writes: What about just coding a FSM into backend/commands/copy.c:CopyReadLine() that does not process any flavor of NL characters when it is inside of a data field? CopyReadLine has no business tracking that. One re

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Tom Lane
Patrick B Kelly <[EMAIL PROTECTED]> writes: > What about just coding a FSM into > backend/commands/copy.c:CopyReadLine() that does not process any flavor > of NL characters when it is inside of a data field? CopyReadLine has no business tracking that. One reason why not is that it is dealing wi

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Andrew Dunstan
Patrick B Kelly wrote: What about just coding a FSM into backend/commands/copy.c:CopyReadLine() that does not process any flavor of NL characters when it is inside of a data field? It would be a major change - the routine doesn't read data a field at a time, and has no idea if we are even in

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Patrick B Kelly
On Nov 11, 2004, at 2:56 PM, Andrew Dunstan wrote: Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: Patrick B Kelly wrote: Actually, when I try to export a sheet with multi-line cells from excel, it tells me that this feature is incompatible with the CSV format and will not include the

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread David Fetter
On Thu, Nov 11, 2004 at 03:38:16PM -0500, Greg Stark wrote: > > Tom Lane <[EMAIL PROTECTED]> writes: > > > I would vote in favor of removing the current code that attempts > > to support unquoted newlines, and waiting to see if there are > > complaints. > > Uhm. *raises hand* > > I agree with y

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > I would vote in favor of removing the current code that attempts to > support unquoted newlines, and waiting to see if there are complaints. Uhm. *raises hand* I agree with your argument but one way or another I have to load these CSVs I'm given. And like

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: Patrick B Kelly wrote: Actually, when I try to export a sheet with multi-line cells from excel, it tells me that this feature is incompatible with the CSV format and will not include them in the CSV file. It probably de

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes: > Patrick B Kelly wrote: >> Actually, when I try to export a sheet with multi-line cells from >> excel, it tells me that this feature is incompatible with the CSV >> format and will not include them in the CSV file. > It probably depends on the version.

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Andrew Dunstan
Patrick B Kelly wrote: On Nov 10, 2004, at 6:10 PM, Andrew Dunstan wrote: The last really isn't an option, because the whole point of CSVs is to play with other programs, and my understanding is that those that understand multiline fields (e.g. Excel) expect them not to be escaped, and do not p

Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Patrick B Kelly
On Nov 10, 2004, at 6:10 PM, Andrew Dunstan wrote: The last really isn't an option, because the whole point of CSVs is to play with other programs, and my understanding is that those that understand multiline fields (e.g. Excel) expect them not to be escaped, and do not produce them escaped. Ac

[HACKERS] multiline CSV fields

2004-11-10 Thread Andrew Dunstan
Darcy Buskermolen has drawn my attention to unfortunate behaviour of COPY CSV with fields containing embedded line end chars if the embedded sequence isn't the same as those of the file containing the CSV data. In that case we error out when reading the data in. This means there are cases where