Re: [HACKERS] invalidly encoded strings

2007-09-18 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: What I think we'd need to have a complete solution is convert(text, name) returns bytea -- convert from DB encoding to arbitrary encoding convert(bytea, name, name) returns bytea -- convert between any two

Re: [HACKERS] invalidly encoded strings

2007-09-18 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-09-18 kell 08:08, kirjutas Andrew Dunstan: Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: What I think we'd need to have a complete solution is convert(text, name) returns bytea -- convert from DB encoding to arbitrary

Re: [HACKERS] invalidly encoded strings

2007-09-18 Thread Andrew Dunstan
Hannu Krosing wrote: Ühel kenal päeval, T, 2007-09-18 kell 08:08, kirjutas Andrew Dunstan: Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: What I think we'd need to have a complete solution is convert(text, name) returns bytea --

Re: [HACKERS] invalidly encoded strings

2007-09-18 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: What's bothering me here though is that in the two argument forms, if the first argument is text the second argument is the destination encoding, but if the first argument is a bytea the second argument is the source encoding. That strikes me as

Re: [HACKERS] invalidly encoded strings

2007-09-18 Thread Andrew Dunstan
Tom Lane wrote: Anyway, on the strength of that, these functions are definitely best named to stay away from the spec syntax, so +1 for your proposal above. OK, I have committed this and the other the functional changes that should change the encoding holes. Catalog version

Re: [HACKERS] invalidly encoded strings

2007-09-18 Thread Gregory Stark
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: No. We have a function overloading system, we should use it. In general I agree with you. What's bothering me here though is that in the two argument forms, if the first argument is text the second argument is the destination

Re: [HACKERS] invalidly encoded strings

2007-09-16 Thread Andrew Dunstan
Tom Lane wrote: What I think we'd need to have a complete solution is convert(text, name) returns bytea -- convert from DB encoding to arbitrary encoding convert(bytea, name, name) returns bytea -- convert between any two encodings convert(bytea, name) returns text

Re: [HACKERS] invalidly encoded strings

2007-09-16 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: What I think we'd need to have a complete solution is convert(text, name) returns bytea -- convert from DB encoding to arbitrary encoding convert(bytea, name, name) returns bytea -- convert between any two encodings

Re: [HACKERS] invalidly encoded strings

2007-09-14 Thread Andrew Dunstan
Tom Lane wrote: I think really the technically cleanest solution would be to make convert() return bytea instead of text; then we'd not have to put restrictions on what encoding or locale it's working inside of. However, it's not clear to me whether there are valid usages that that would

Re: [HACKERS] invalidly encoded strings

2007-09-14 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Are you wanting this done for 8.3? If so, by whom? :-) [ shrug... ] I'm not the one who's worried about closing all the holes leading to encoding problems. regards, tom lane ---(end of

Re: [HACKERS] invalidly encoded strings

2007-09-14 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Are you wanting this done for 8.3? If so, by whom? :-) [ shrug... ] I'm not the one who's worried about closing all the holes leading to encoding problems. I can certainly have a go at it. Are

Re: [HACKERS] invalidly encoded strings

2007-09-14 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: I can certainly have a go at it. Are we still talking about Oct 1 for a possible beta? Yeah, there's still a little time left --- HOT will take at least a few more days. regards, tom lane ---(end of

Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread Martijn van Oosterhout
On Tue, Sep 11, 2007 at 11:27:50AM +0900, Tatsuo Ishii wrote: SELECT * FROM japanese_table ORDER BY convert(japanese_text using utf8_to_euc_jp); Without using convert(), he will get random order of data. This is because Kanji characters are in random order in UTF-8, while Kanji characters

Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread Jeff Davis
On Tue, 2007-09-11 at 14:50 +0900, Tatsuo Ishii wrote: On Tue, 2007-09-11 at 12:29 +0900, Tatsuo Ishii wrote: Please show me concrete examples how I could introduce a vulnerability using this kind of convert() usage. Try the sequence below. Then, try to dump and then reload the

Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread db
Try the sequence below. Then, try to dump and then reload the database. When you try to reload it, you will get an error: ERROR: invalid byte sequence for encoding UTF8: 0xbd I know this could be a problem (like chr() with invalid byte pattern). And that's enough of a problem already. We

Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread Tatsuo Ishii
On Tue, 2007-09-11 at 14:50 +0900, Tatsuo Ishii wrote: On Tue, 2007-09-11 at 12:29 +0900, Tatsuo Ishii wrote: Please show me concrete examples how I could introduce a vulnerability using this kind of convert() usage. Try the sequence below. Then, try to dump and then reload

Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread Albe Laurenz
Andrew Dunstan wrote: Instead of the code point, I'd prefer the actual encoding of the character as argument to chr() and return value of ascii(). And frankly, I don't know how to do it sanely anyway. A character encoding has a fixed byte pattern, but a given byte pattern doesn't have a

Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread Jeff Davis
On Mon, 2007-09-10 at 23:20 -0400, Tom Lane wrote: The reason we have a problem here is that we've been choosing convenience over safety in encoding-related issues. I wonder if we must stoop to having a strict_encoding_checks GUC variable to satisfy everyone. That would be satisfactory to

Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes: On Mon, 2007-09-10 at 23:20 -0400, Tom Lane wrote: It might work the way you are expecting if the database uses SQL_ASCII encoding and C locale --- and I'd be fine with allowing convert() only when the database encoding is SQL_ASCII. I prefer this option.

Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread Jeff Davis
On Tue, 2007-09-11 at 14:48 -0400, Tom Lane wrote: Jeff Davis [EMAIL PROTECTED] writes: On Mon, 2007-09-10 at 23:20 -0400, Tom Lane wrote: It might work the way you are expecting if the database uses SQL_ASCII encoding and C locale --- and I'd be fine with allowing convert() only when the

Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread Alvaro Herrera
Tom Lane wrote: Jeff Davis [EMAIL PROTECTED] writes: On Mon, 2007-09-10 at 23:20 -0400, Tom Lane wrote: It might work the way you are expecting if the database uses SQL_ASCII encoding and C locale --- and I'd be fine with allowing convert() only when the database encoding is SQL_ASCII.

Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: I think really the technically cleanest solution would be to make convert() return bytea instead of text; then we'd not have to put restrictions on what encoding or locale it's working inside of. However, it's not clear to me whether

Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes: If we make convert() operate on bytea and return bytea, as Tom suggested, would that solve your use case? The problem is, the above use case is just one of what I can think of. Another use case is, something like this: SELECT

Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread Tatsuo Ishii
However ISTM we would also need something like length(bytea, name) returns int -- counts the number of characters assuming that the bytea is in -- the given encoding Hmm, I wonder if counting chars is consistent regardless of the encoding the string is in. To me it sounds

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Albe Laurenz
Tom Lane wrote: . for chr() under UTF8, it seems to be generally agreed that the argument should represent the codepoint and the function should return the correspondingly encoded character. If so, possible the argument should be a bigint to accommodate the full range of possible code points.

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Jeff Davis [EMAIL PROTECTED] writes: I think the concern is when they use only one slash, like: E'\377\000\377'::bytea which, as I mentioned before, is not correct anyway. Wait, why would this be wrong? How would you enter the three byte bytea of

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread db
I think the concern is when they use only one slash, like: E'\377\000\377'::bytea which, as I mentioned before, is not correct anyway. Wait, why would this be wrong? How would you enter the three byte bytea of consisting of those three bytes described above? Either as E'\\377\\000\\377'

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: In the short run it might be best to do it in scan.l after all. I have not come up with a way of doing that and handling the bytea case. AFAICS we have no realistic choice other than to

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Albe Laurenz wrote: I'd like to repeat my suggestion for chr() and ascii(). Instead of the code point, I'd prefer the actual encoding of the character as argument to chr() and return value of ascii(). [snip] Of course, if it is generally perceived that the code point is more useful

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Perhaps we're talking at cross purposes. The problem with doing encoding validation in scan.l is that it lacks context. Null bytes are only the tip of the bytea iceberg, since any arbitrary sequence of bytes can be valid for a bytea. If you think

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: The reason we are prepared to make an exception for Unicode is precisely because the code point maps to an encoding pattern independently of architecture, ISTM. Right --- there is a well-defined standard for the numerical value of each character in

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: Those should be checked already --- if not, the right fix is still to fix it there, not in per-datatype code. I think we are OK though, eg see need_transcoding logic in copy.c. Well, a little experimentation shows that we currently

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Perhaps we're talking at cross purposes. The problem with doing encoding validation in scan.l is that it lacks context. Null bytes are only the tip of the bytea iceberg, since any arbitrary sequence of bytes can be valid

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
Andrew Dunstan [EMAIL PROTECTED] writes: The reason we are prepared to make an exception for Unicode is precisely because the code point maps to an encoding pattern independently of architecture, ISTM. Right --- there is a well-defined standard for the numerical value of each

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
BTW, I'm sure this was discussed but I forgot the conclusion: should chr(0) throw an error? If we're trying to get rid of embedded-null problems, seems it must. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Martijn van Oosterhout
On Tue, Sep 11, 2007 at 12:30:51AM +0900, Tatsuo Ishii wrote: Why do you think that employing the Unicode code point as the chr() argument could avoid endianness issues? Are you going to represent Unicode code point as UCS-4? Then you have to specify the endianness anyway. (see the UCS-4

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tatsuo Ishii wrote: I don't understand whole discussion. Why do you think that employing the Unicode code point as the chr() argument could avoid endianness issues? Are you going to represent Unicode code point as UCS-4? Then you have to specify the endianness anyway. (see the UCS-4

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tom Lane wrote: BTW, I'm sure this was discussed but I forgot the conclusion: should chr(0) throw an error? If we're trying to get rid of embedded-null problems, seems it must. I think it should, yes. cheers andrew ---(end of

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: BTW, I'm sure this was discussed but I forgot the conclusion: should chr(0) throw an error? I think it should, yes. OK. Looking back, there was also some mention of changing chr's argument to bigint, but I'd counsel against doing

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tom Lane wrote: OK. Looking back, there was also some mention of changing chr's argument to bigint, but I'd counsel against doing that. We should not need it since we only support 4-byte UTF8, hence code points only up to 21 bits (and indeed even 6-byte UTF8 can only have 31-bit code points,

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Martijn van Oosterhout
On Mon, Sep 10, 2007 at 11:48:29AM -0400, Tom Lane wrote: BTW, I'm sure this was discussed but I forgot the conclusion: should chr(0) throw an error? If we're trying to get rid of embedded-null problems, seems it must. It is pointed out on wikipedia that Java sometimes uses to byte pair C0 80

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
Tatsuo Ishii wrote: I don't understand whole discussion. Why do you think that employing the Unicode code point as the chr() argument could avoid endianness issues? Are you going to represent Unicode code point as UCS-4? Then you have to specify the endianness anyway. (see the

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
Tatsuo Ishii wrote: I don't understand whole discussion. Why do you think that employing the Unicode code point as the chr() argument could avoid endianness issues? Are you going to represent Unicode code point as UCS-4? Then you have to specify the endianness anyway. (see the

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes: If you regard the unicode code point as simply a number, why not regard the multibyte characters as a number too? Because there's a standard specifying the Unicode code points *as numbers*. The mapping from those numbers to UTF8 strings (and other

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tatsuo Ishii wrote: If you regard the unicode code point as simply a number, why not regard the multibyte characters as a number too? I mean, since 0xC2A9 = 49833, select chr(49833) should work fine no? No. The number corresponding to a given byte pattern depends on the endianness of

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
Tatsuo Ishii [EMAIL PROTECTED] writes: If you regard the unicode code point as simply a number, why not regard the multibyte characters as a number too? Because there's a standard specifying the Unicode code points *as numbers*. The mapping from those numbers to UTF8 strings (and other

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Jeff Davis
On Tue, 2007-09-11 at 11:27 +0900, Tatsuo Ishii wrote: BTW, it strikes me that there is another hole that we need to plug in this area, and that's the convert() function. Being able to create a value of type text that is not in the database encoding is simply broken. Perhaps we could

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
On Tue, 2007-09-11 at 11:27 +0900, Tatsuo Ishii wrote: BTW, it strikes me that there is another hole that we need to plug in this area, and that's the convert() function. Being able to create a value of type text that is not in the database encoding is simply broken. Perhaps we

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Tatsuo Ishii wrote: BTW, it strikes me that there is another hole that we need to plug in this area, and that's the convert() function. Being able to create a value of type text that is not in the database encoding is simply broken. Perhaps we could make it work on bytea instead (providing a

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Andrew Dunstan
Jeff Davis wrote: On Tue, 2007-09-11 at 11:27 +0900, Tatsuo Ishii wrote: BTW, it strikes me that there is another hole that we need to plug in this area, and that's the convert() function. Being able to create a value of type text that is not in the database encoding is simply broken.

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: I'm not sure we are going to be able to catch every path by which invalid data can get into the database in one release. I suspect we might need two or three goes at this. (I'm just wondering if the routines that return cstrings are a possible

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes: BTW, it strikes me that there is another hole that we need to plug in this area, and that's the convert() function. Being able to create a value of type text that is not in the database encoding is simply broken. Perhaps we could make it work on bytea

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
Tatsuo Ishii [EMAIL PROTECTED] writes: BTW, it strikes me that there is another hole that we need to plug in this area, and that's the convert() function. Being able to create a value of type text that is not in the database encoding is simply broken. Perhaps we could make it work on

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Jeff Davis
On Tue, 2007-09-11 at 11:53 +0900, Tatsuo Ishii wrote: Isn't the collation a locale issue, not an encoding issue? Is there a ja_JP.UTF-8 that defines the proper order? I don't think it helps. The point is, he needs different language's collation, while PostgreSQL allows only one

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Jeff Davis
On Tue, 2007-09-11 at 12:29 +0900, Tatsuo Ishii wrote: Please show me concrete examples how I could introduce a vulnerability using this kind of convert() usage. Try the sequence below. Then, try to dump and then reload the database. When you try to reload it, you will get an error: ERROR:

Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread Tatsuo Ishii
On Tue, 2007-09-11 at 12:29 +0900, Tatsuo Ishii wrote: Please show me concrete examples how I could introduce a vulnerability using this kind of convert() usage. Try the sequence below. Then, try to dump and then reload the database. When you try to reload it, you will get an error:

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Martijn van Oosterhout
On Sun, Sep 09, 2007 at 12:02:28AM -0400, Andrew Dunstan wrote: . what do we need to do to make the verification code more efficient? I think we need to address the correctness issue first, but doing so should certainly make us want to improve the verification code. For example, I'm

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Andrew Dunstan
Martijn van Oosterhout wrote: On Sun, Sep 09, 2007 at 12:02:28AM -0400, Andrew Dunstan wrote: . what do we need to do to make the verification code more efficient? I think we need to address the correctness issue first, but doing so should certainly make us want to improve the

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: I have been looking at fixing the issue of accepting strings that are not valid in the database encoding. It appears from previous discussion that we need to add a call to pg_verifymbstr() to the relevant input routines and ensure that the chr()

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Andrew Dunstan
Tom Lane wrote: A possible answer is to add a verifymbstr to the string literal converter anytime it has processed a numeric backslash-escape in the string. Open questions for that are (1) does it have negative effects for bytea, and if so is there any hope of working around it? (2) how can

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Is that going to cover data coming in via COPY? and parameters for prepared statements? Those should be checked already --- if not, the right fix is still to fix it there, not in per-datatype code. I think we are OK though,

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Jeff Davis
On Sun, 2007-09-09 at 10:51 -0400, Tom Lane wrote: A possible answer is to add a verifymbstr to the string literal converter anytime it has processed a numeric backslash-escape in the string. Open questions for that are (1) does it have negative effects for bytea, and if so is there any hope

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Well, a little experimentation shows that we currently are not OK: This experiment is inadequately described. What is the type of the column involved? regards, tom lane ---(end of

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes: Currently, you can pass a bytea literal as either: E'\377\377\377' or E'\\377\\377\\377'. The first strategy (single backslash) is not correct, because if you do E'\377\000\377', the embedded null character counts as the end of the cstring, even though

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Jeff Davis
On Sun, 2007-09-09 at 17:09 -0400, Tom Lane wrote: Jeff Davis [EMAIL PROTECTED] writes: Currently, you can pass a bytea literal as either: E'\377\377\377' or E'\\377\\377\\377'. The first strategy (single backslash) is not correct, because if you do E'\377\000\377', the embedded null

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Well, a little experimentation shows that we currently are not OK: This experiment is inadequately described. What is the type of the column involved? Sorry. It's text. cheers andrew

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes: Would stringTypeDatum() in parse_type.c be a good place to put the pg_verifymbstr()? Probably not, in its current form, since it hasn't got any idea where the char *string came from; moreover it is not in any better position than the typinput function to

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Andrew Dunstan
Tom Lane wrote: In the short run it might be best to do it in scan.l after all. I have not come up with a way of doing that and handling the bytea case. If you have I'm all ears. And then I am still worried about COPY. cheers andrew ---(end of

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Jeff Davis
On Sun, 2007-09-09 at 23:22 -0400, Tom Lane wrote: In the short run it might be best to do it in scan.l after all. A few minutes' thought about what it'd take to delay the decisions till later yields a depressingly large number of changes; and we do not have time to be developing

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: In the short run it might be best to do it in scan.l after all. I have not come up with a way of doing that and handling the bytea case. AFAICS we have no realistic choice other than to reject \0 in SQL literals; to do otherwise

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Jeff Davis
On Sun, 2007-09-09 at 23:33 -0400, Andrew Dunstan wrote: Tom Lane wrote: In the short run it might be best to do it in scan.l after all. I have not come up with a way of doing that and handling the bytea case. If you have I'm all ears. And then I am still worried about COPY. If

Re: [HACKERS] invalidly encoded strings

2007-09-09 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes: If it's done in the scanner it should still accept things like: E'\\377\\000\\377'::bytea right? Right, that will work, because the transformed literal is '\377\000\377' (no strange characters there, just what it says) and that has not got any encoding

[HACKERS] invalidly encoded strings

2007-09-08 Thread Andrew Dunstan
I have been looking at fixing the issue of accepting strings that are not valid in the database encoding. It appears from previous discussion that we need to add a call to pg_verifymbstr() to the relevant input routines and ensure that the chr() function returns a valid string. That leaves