Alon Goldshuv wrote:
I've been following this thread, and I'm a little confused. Could you
possibly clarify what you mean, by providing a couple of lines of input
as it would be formatted with escape processing turned off - containing
a text field with an embedded newline and tab and a null
Bruce,
Point taken.
Now, you can argue that
a different escape should be possible, or that some other escape syntax
could be used, but the existing mechanism is clearly 100% reliable when
used properly and not broken.
I think that having an option for another escape syntax (such as using
Alon Goldshuv wrote:
Bruce,
Point taken.
Now, you can argue that
a different escape should be possible, or that some other escape syntax
could be used, but the existing mechanism is clearly 100% reliable when
used properly and not broken.
I think that having an option for another
Bruce,
Is there a good source of multi-byte copy data test cases? What is
currently done to test the trans-coding support? (where client and server
encodings are different)
I notice that the regression data in the CVS version of postgres does not
seem to include cases other than the ASCII data,
Luke Lonergan wrote:
Bruce,
Is there a good source of multi-byte copy data test cases? What is
currently done to test the trans-coding support? (where client and server
encodings are different)
I notice that the regression data in the CVS version of postgres does not
seem to include
Bruce,
We have two and three-byte encodings, so 16-bit seems like it wouldn't
work. I am not aware of any specs except the C code itself.
Ok - no problem.
How about test data and cases? I see the SQL encoding examples used in
src/test/regress/sql for testing encoding in SQL, but are there
Luke Lonergan wrote:
Bruce,
We have two and three-byte encodings, so 16-bit seems like it wouldn't
work. I am not aware of any specs except the C code itself.
Ok - no problem.
How about test data and cases? I see the SQL encoding examples used in
src/test/regress/sql for testing
Bruce,
I have no idea what you are talking about. Again, give me facts about
what we currently don't do and what you want to do.
Currently:
- No statement of multi-byte control character format
- No tests to define or prove works flawlessly or identify when something
breaks the current
Alon Goldshuv wrote:
I think that the basic issue is that there are some database users that would
like to take their data and put it into the database without pre-processing
it [...]
The only responsibility of these users is to explicitly escape any delimiter
or 0x0A (LF) characters that
On K, 2005-06-01 at 18:05 -0400, Bruce Momjian wrote:
What we could do is to do no-WAL automatically for empty tables (like
when a database is first loaded),
You forget that some databases use WAL for PITR / replication and doing
it automatically there would surely mess up their replica.
How
On K, 2005-06-01 at 11:31 -0400, Tom Lane wrote:
Hannu Krosing [EMAIL PROTECTED] writes:
I think this should be a decision done when creating a table, just like
TEMP tables. So you always know if a certain table is or is not
safe/replicated/recoverable.
This has also the advantage of
Escape processing would proceed as before, but the semantics would change to
allow the use of different characters as the escape character, in addition
to the special characters for delimiter and newline.
If you mean syntax to specify escape and delimiter (and newline ?),
that is a great
Andreas,
Escape processing would proceed as before, but the semantics would change to
allow the use of different characters as the escape character, in addition
to the special characters for delimiter and newline.
If you mean syntax to specify escape and delimiter (and newline ?),
that is
Oliver,
Haven't you just replaced one preprocessing step with another, then?
Generally not. The most common problem with the current choice of escape
character is that there are *lots* of data load scenarios with backslash in
the text strings. The extra preprocessing to escape them is
On Thu, Jun 02, 2005 at 07:33:13AM -0700, Luke Lonergan wrote:
Oliver,
Haven't you just replaced one preprocessing step with another, then?
Generally not. The most common problem with the current choice of escape
character is that there are *lots* of data load scenarios with backslash in
Steve,
I can only think of one where it's common. Windows filenames.
Nearly all weblog data then.
But if
you're going to support arbitrary data in a load then whatever escape
character you choose will appear sometimes.
If we allow an 8-bit character set in the text file, then yes, any
2) A modified command syntax for introducing a direct single row error
handling. By direct I mean - a row that if rejected from within the COPY
command context does not throw an error and rollsback the whole transaction.
Instead the error is caught and recorded elsewhere, maybe in some error
Luke Lonergan [EMAIL PROTECTED] writes:
In the data warehousing industry, data conversion and manipulation is
normally kept distinct from data loading.
It's a bit strange to call this conversion or manipulation. One way or another
you have to escape whatever your delimiters are. How would
Personally I don't think that it is a good idea to do that.
People will tend to corrupt their systems because they want speed
(sometimes without thinking about the consequences).
I can only think of one scenario where nologging would actually make
sense: Many people use session tables to keep
On Tue, 2005-05-31 at 22:47 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
Recent test results have shown a substantial performance improvement
(+25%) if WAL logging is disabled for large COPY statements.
How much of that is left after we fix the 64-bit-CRC issue?
Well, I
Neil Conway [EMAIL PROTECTED] writes:
On Wed, 2005-06-01 at 00:40 -0400, Alvaro Herrera wrote:
This doesn't work for COPY, but maybe for CREATE TABLE AS we could log
the fact that the command was executed, so the replayer could execute
the same command again.
Of course, this
There are some other arguments in favour of a LOAD command Alon?
We already have LOAD, so you'll have to choose something else :)
Chris
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
Greg Stark [EMAIL PROTECTED] writes:
For CREATE TABLE AS in the non-PITR case you don't really need to WAL log the
records at all. If it fails in the middle you just drop the table. When it
completes you do a checkpoint before acknowledging the COMMIT.
I think this is already done for
On K, 2005-06-01 at 09:16 +0100, Simon Riggs wrote:
On Tue, 2005-05-31 at 22:47 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
Recent test results have shown a substantial performance improvement
(+25%) if WAL logging is disabled for large COPY statements.
How much of
On Wed, 2005-06-01 at 16:34 +0800, Christopher Kings-Lynne wrote:
There are some other arguments in favour of a LOAD command Alon?
We already have LOAD, so you'll have to choose something else :)
Its annoying, I grant you. :-)
LOAD 'library' would still need to be the default.
LOAD
On Wed, 2005-06-01 at 04:44 -0400, Greg Stark wrote:
Greg Stark [EMAIL PROTECTED] writes:
For CREATE TABLE AS in the non-PITR case you don't really need to WAL log
the
records at all. If it fails in the middle you just drop the table. When it
completes you do a checkpoint before
On K, 2005-06-01 at 00:01 +0100, Simon Riggs wrote:
Recent test results have shown a substantial performance improvement
(+25%) if WAL logging is disabled for large COPY statements. This is to
be expected, though has a price attached: losing the ability to crash
recover data loaded in this
Simon Riggs wrote:
On Wed, 2005-06-01 at 04:44 -0400, Greg Stark wrote:
Greg Stark [EMAIL PROTECTED] writes:
For CREATE TABLE AS in the non-PITR case you don't really need to WAL log
the
records at all. If it fails in the middle you just drop the table. When it
completes you do
Greg Stark [EMAIL PROTECTED] writes:
Sorry to followup to my own message, but it occurs to me that COPY could be
made to automatically do this for the case of an empty destination table too.
Not unless you are proposing to change COPY to acquire a lock strong
enough to lock out other writers to
Simon Riggs [EMAIL PROTECTED] writes:
On Wed, 2005-06-01 at 16:34 +0800, Christopher Kings-Lynne wrote:
There are some other arguments in favour of a LOAD command Alon?
We already have LOAD, so you'll have to choose something else :)
Its annoying, I grant you. :-)
LOAD 'library' would
On 01 Jun 2005 04:44:24 -0400, Greg Stark wrote:
Greg Stark writes:
For CREATE TABLE AS in the non-PITR case you don't really need to WAL log the
records at all. If it fails in the middle you just drop the table. When it
completes you do a checkpoint before acknowledging the COMMIT.
I
Tom Lane wrote:
Greg Stark [EMAIL PROTECTED] writes:
Sorry to followup to my own message, but it occurs to me that COPY could be
made to automatically do this for the case of an empty destination table
too.
Not unless you are proposing to change COPY to acquire a lock strong
enough to
Jochem van Dieten wrote:
On 01 Jun 2005 04:44:24 -0400, Greg Stark wrote:
Greg Stark writes:
For CREATE TABLE AS in the non-PITR case you don't really need to WAL log
the
records at all. If it fails in the middle you just drop the table. When it
completes you do a checkpoint before
Bruce Momjian pgman@candle.pha.pa.us writes:
Tom Lane wrote:
Not unless you are proposing to change COPY to acquire a lock strong
enough to lock out other writers to the table for the duration ...
Well, if the table is initally empty, what harm is there in locking the
table?
You cannot
On 6/1/05, Hans-Jürgen Schönig [EMAIL PROTECTED] wrote:
Personally I don't think that it is a good idea to do that.
People will tend to corrupt their systems because they want speed
(sometimes without thinking about the consequences).
I can only think of one scenario where nologging would
On Wed, 2005-06-01 at 10:18 -0400, Tom Lane wrote:
Bruce Momjian pgman@candle.pha.pa.us writes:
Tom Lane wrote:
Not unless you are proposing to change COPY to acquire a lock strong
enough to lock out other writers to the table for the duration ...
Well, if the table is initally empty,
Hannu Krosing [EMAIL PROTECTED] writes:
I think this should be a decision done when creating a table, just like
TEMP tables. So you always know if a certain table is or is not
safe/replicated/recoverable.
This has also the advantage of requiring no changes to actual COPY and
INSERT commands.
Bruce Momjian pgman@candle.pha.pa.us writes:
Jochem van Dieten wrote:
Why only on an empty table? What is the problem with bypassing WAL on
any table as long as all files of that table are fsync'ed before
commit?
Because adding rows to a table might modify existing pages, and if the
COPY
Simon Riggs [EMAIL PROTECTED] writes:
4. Optionally, we set a flag on the table showing the whole table is
frozen. Anybody writing to this table subsequently will spoil this flag.
If the flag is set, all forms of VACUUM will return success immediately
without performing a scan (since it is
On 6/1/05, Bruce Momjian wrote:
Jochem van Dieten wrote:
Why only on an empty table? What is the problem with bypassing WAL on
any table as long as all files of that table are fsync'ed before
commit?
Because adding rows to a table might modify existing pages, and if the
COPY fails, you
On Wed, 2005-06-01 at 11:31 -0400, Tom Lane wrote:
Hannu Krosing [EMAIL PROTECTED] writes:
I think this should be a decision done when creating a table, just like
TEMP tables. So you always know if a certain table is or is not
safe/replicated/recoverable.
This has also the advantage of
On Wed, Jun 01, 2005 at 06:00:28PM +0100, Simon Riggs wrote:
On Wed, 2005-06-01 at 11:31 -0400, Tom Lane wrote:
Perhaps it could work to use an ALTER TABLE command to flip the state.
But I'm not really seeing the point compared to treating it as a COPY
option. I do not believe that anyone
Alvaro Herrera [EMAIL PROTECTED] writes:
- it is an option to COPY and CREATE TABLE AS, not GUC, not ALTER TABLE
AFAICS it could just happen automatically for CREATE TABLE AS; there's
no need for an option there, any more than there is for CREATE INDEX.
The only reason it needs to be an
I have been working on improving the COPY command performance and as a
result also came up with other thoughts that may possibly be better off
implemented in a new command (i.e LOAD DATA) rather than adding them to the
existing COPY.
The improvements I made were in the COPY parsing logic -
On Wed, 2005-06-01 at 11:55 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
4. Optionally, we set a flag on the table showing the whole table is
frozen. Anybody writing to this table subsequently will spoil this flag.
If the flag is set, all forms of VACUUM will return success
On Wed, Jun 01, 2005 at 06:55:46PM +0100, Simon Riggs wrote:
On Wed, 2005-06-01 at 11:55 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
4. Optionally, we set a flag on the table showing the whole table is
frozen. Anybody writing to this table subsequently will spoil this
Simon Riggs [EMAIL PROTECTED] writes:
If the server crashes, we replay WAL. If we see a load start message, we
truncate the relation and note that a load has started. If there is WAL
data for the tuples, we replay it. If WAL replay ends without the load
transaction having successfully
Alvaro Herrera [EMAIL PROTECTED] writes:
On Wed, Jun 01, 2005 at 06:55:46PM +0100, Simon Riggs wrote:
We're holding the table lock and will continue to do so until end of
transaction. No transaction with an earlier id will ever see the data we
load because of the lock.
Suppose you load half
On Wed, 2005-06-01 at 14:24 -0400, Tom Lane wrote:
Alvaro Herrera [EMAIL PROTECTED] writes:
On Wed, Jun 01, 2005 at 06:55:46PM +0100, Simon Riggs wrote:
We're holding the table lock and will continue to do so until end of
transaction. No transaction with an earlier id will ever see the data
On Wed, 2005-06-01 at 14:14 -0400, Tom Lane wrote:
So what I'm thinking is we need no special WAL entries for this. What
we need is just an operating mode of COPY in which it doesn't WAL-log
its inserts, but instead fsyncs before completion, much like index build
does. For safety it must do
Jochem van Dieten wrote:
On 6/1/05, Bruce Momjian wrote:
Jochem van Dieten wrote:
Why only on an empty table? What is the problem with bypassing WAL on
any table as long as all files of that table are fsync'ed before
commit?
Because adding rows to a table might modify existing
Alon Goldshuv wrote:
9) allow for Simon's WAL bypass.
I have surely missed some problems that hide behind the idea, but these
points make me believe that LOAD DATA is a good idea.
The community is unlikely to add a new LOAD DATA command that does
_almost_ everything COPY does. We are much
Tom Lane wrote:
Bruce Momjian pgman@candle.pha.pa.us writes:
Tom Lane wrote:
Not unless you are proposing to change COPY to acquire a lock strong
enough to lock out other writers to the table for the duration ...
Well, if the table is initally empty, what harm is there in locking the
Bruce,
The problem with a new command is that it becomes unclear when you
should use COPY and when LOAD DATA, and it confuses users, and has
maintenance overhead. If Bizgres wants a new command name, go for it,
but it is unlikely that the community release is going to go in that
direction,
Bruce Momjian pgman@candle.pha.pa.us writes:
One idea would be to look at the table file size first. If it has zero
blocks, lock the table and if it still has zero blocks, do the no-WAL
copy.
I think that's a bad idea. It would make the behavior unpredictable
--- sometimes a COPY will take
Luke Lonergan [EMAIL PROTECTED] writes:
One of the reasons to consider a LOAD DATA command is that we can isolate
the need for performance improvements and special syntax from the concerns
of preserving the legacy behavior of COPY for use as the primary mechanism
for DUMP and RESTORE.
... and
On Wed, 2005-06-01 at 10:35 -0700, Alon Goldshuv wrote:
I have been working on improving the COPY command performance
Around 40% for 15 column (mixed types) table.
Around 90% for 1 column table.
Thats very cool.
2) A modified command syntax for introducing a direct single row error
On Wed, 2005-06-01 at 18:32 -0400, Tom Lane wrote:
Bruce Momjian pgman@candle.pha.pa.us writes:
One idea would be to look at the table file size first. If it has zero
blocks, lock the table and if it still has zero blocks, do the no-WAL
copy.
I think that's a bad idea. It would make
Luke Lonergan wrote:
Bruce,
The problem with a new command is that it becomes unclear when you
should use COPY and when LOAD DATA, and it confuses users, and has
maintenance overhead. If Bizgres wants a new command name, go for it,
but it is unlikely that the community release is going
Tom Lane wrote:
Bruce Momjian pgman@candle.pha.pa.us writes:
One idea would be to look at the table file size first. If it has zero
blocks, lock the table and if it still has zero blocks, do the no-WAL
copy.
I think that's a bad idea. It would make the behavior unpredictable
---
Simon Riggs wrote:
I'm not clear from all of those options whether we still need a LOAD
command, based upon other issues/comments raised on this thread.
However, there are some other arguments for why it might be a good idea
to have a LOAD DATA command separate from COPY. Certainly long term
Simon Riggs wrote:
On Wed, 2005-06-01 at 10:35 -0700, Alon Goldshuv wrote:
2) A modified command syntax for introducing a direct single row error
handling. By direct I mean - a row that if rejected from within the COPY
5) allow an ERRORLIMIT to allow control of aborting a load after a
Bruce,
The patch is not there to show that something is broken is it there to
show how things could be done in another way, which may or may not be more
desireable.
COPY works as designed. The idea that some guy we have never heard of
is going to appear and rewrite COPY's processing and tell
Tom,
... and instead, define some new behavior that will soon be considered
broken legacy code itself?
I'll not argue further about whether to have a separate LOAD command.
That's not as important as fixing the performance issues in the data load
path in PostgreSQL to me.
However, I find it
Alon Goldshuv wrote:
Bruce,
The patch is not there to show that something is broken is it there to
show how things could be done in another way, which may or may not be more
desireable.
Sure, we are always looking for ways to improve things.
COPY works as designed. The idea that some
Luke Lonergan wrote:
Tom,
... and instead, define some new behavior that will soon be considered
broken legacy code itself?
I'll not argue further about whether to have a separate LOAD command.
That's not as important as fixing the performance issues in the data load
path in
Bruce,
Yep, you have to double backslahses coming in as data so we can use
backslash for marking null, delimiters, etc. I see no way around that,
and no one since Berkeley has come up with one either. If you have an
idea, we would _love_ to hear it.
Apropos to the previous thread on escape
Yep, we would _love_ those improvements.
Coming soon, probably from the guy you've never heard of :-)
I am confused why you are confused. :-)
Uh, how do you do the escapes if you don't double the escape character
on input so you can distinguish a literal escape from one use to mark
Luke Lonergan wrote:
I propose an extended syntax to COPY with a change in semantics to remove
the default of WITH ESCAPE '\'.
Er, doesn't this break existing database dumps?
-O
---(end of broadcast)---
TIP 3: if posting/reading through
Luke Lonergan wrote:
Yep, we would _love_ those improvements.
Coming soon, probably from the guy you've never heard of :-)
LOL
I am confused why you are confused. :-)
Uh, how do you do the escapes if you don't double the escape character
on input so you can distinguish
Oliver Jowett wrote:
Luke Lonergan wrote:
I propose an extended syntax to COPY with a change in semantics to remove
the default of WITH ESCAPE '\'.
Er, doesn't this break existing database dumps?
Right, we will not change the default.
--
Bruce Momjian|
Oliver,
On 6/1/05 7:13 PM, Oliver Jowett [EMAIL PROTECTED] wrote:
Luke Lonergan wrote:
I propose an extended syntax to COPY with a change in semantics to remove
the default of WITH ESCAPE '\'.
Er, doesn't this break existing database dumps?
Yes, one of the previously stated reasons to
How do you distinguis an escape-delimiter used to escape a delimiter in
the data from a literal escape-delimiter in the data being loaded --- it
seems impossible to do.
The usual approach is fine - a pair of escapes is equivalent to a literal
escape.
Or are you saying there would be no
On Wed, Jun 01, 2005 at 07:35:33PM -0700, Luke Lonergan wrote:
I propose an extended syntax to COPY with a change in semantics to remove
the default of WITH ESCAPE '\'.
Er, doesn't this break existing database dumps?
Yes, one of the previously stated reasons to create another command
On Wed, Jun 01, 2005 at 10:35:30AM -0700, Alon Goldshuv wrote:
2) A modified command syntax for introducing a direct single row error
handling. By direct I mean - a row that if rejected from within the COPY
command context does not throw an error and rollsback the whole transaction.
Instead
Steve,
I've been following this thread, and I'm a little confused. Could you
possibly clarify what you mean, by providing a couple of lines of
input as it would be formatted with escape processing turned off -
containing a text field with an embedded newline and tab and a null field.
Using
/2005 10:47 PM
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] NOLOGGING option, or ?
On Wed, Jun 01, 2005 at 07:35:33PM -0700, Luke Lonergan wrote:
I propose an extended syntax to COPY with a change in semantics to remove
the default of WITH ESCAPE '\'.
Er, doesn't this break
-hackers@postgresql.org
Subject: Re: [HACKERS] NOLOGGING option, or ?
On Wed, Jun 01, 2005 at 07:35:33PM -0700, Luke Lonergan wrote:
I propose an extended syntax to COPY with a change in semantics to remove
the default of WITH ESCAPE '\'.
Er, doesn't this break existing database dumps
Steve,
Oops. Example below should have read differently:
Sample with 2 identical rows (with binary representations depicted between
):
Blahblah0xaablahblah0x09blahblah0x00blahblah0xaaBlahblah0xaablahbl
ah0x09blahblah0x00blahblah0xaa
On Thu, Jun 02, 2005 at 12:30:01AM -0400, Alon Goldshuv wrote:
before that, let me just clarify that the performance improvements in
the patch have nothing to do with the escaping mechanizm. Escapes
could change. The performance gain in due to a buffered processing
with minimal line/attribute
Recent test results have shown a substantial performance improvement
(+25%) if WAL logging is disabled for large COPY statements. This is to
be expected, though has a price attached: losing the ability to crash
recover data loaded in this manner.
There are two parts to this proposal. First, when
Simon Riggs [EMAIL PROTECTED] writes:
Recent test results have shown a substantial performance improvement
(+25%) if WAL logging is disabled for large COPY statements.
How much of that is left after we fix the 64-bit-CRC issue?
Now, I would like to discuss adding an enable_logging USERSET
On Tue, May 31, 2005 at 10:47:30PM -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
Recent test results have shown a substantial performance improvement
(+25%) if WAL logging is disabled for large COPY statements.
BTW, I'm sure you are the last one who needs to be reminded that
On Wed, 2005-06-01 at 00:40 -0400, Alvaro Herrera wrote:
This doesn't work for COPY, but maybe for CREATE TABLE AS we could log
the fact that the command was executed, so the replayer could execute
the same command again.
Of course, this handwaving doesn't explain how the system in recovery
84 matches
Mail list logo