[GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Ivan Zolotukhin
Hello, Imagine a web application that process text search queries from clients. If one types a text search query in a browser it then sends proper UTF-8 characters and application after all needed processing (escaping, checks, etc) passes it to database. But if one modifies URL of the query

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Martijn van Oosterhout
On Wed, Aug 15, 2007 at 03:41:30PM +0400, Ivan Zolotukhin wrote: Hello, Imagine a web application that process text search queries from clients. If one types a text search query in a browser it then sends proper UTF-8 characters and application after all needed processing (escaping, checks,

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Phoenix Kiula
On 15/08/07, Ivan Zolotukhin [EMAIL PROTECTED] wrote: Hello, Imagine a web application that process text search queries from clients. If one types a text search query in a browser it then sends proper UTF-8 characters and application after all needed processing (escaping, checks, etc) passes

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Ivan Zolotukhin
Hello, Well, PostgreSQL is correct entirely, I would post this message to the -hackers list otherwise :) The question was rather about application processing of user input not about change of database reaction on broken UTF-8 string. But I am 100% sure one should fix the input in this case since

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Ivan Zolotukhin
Hello, Actually I tried smth like $str = @iconv(UTF-8, UTF-8//IGNORE, $str); when preparing string for SQL query and it worked. There's probably a better way in PHP to achieve this: simply change default values in php.ini for these parameters: mbstring.encoding_translation = On

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Vivek Khera
On Aug 15, 2007, at 7:41 AM, Ivan Zolotukhin wrote: What is the best practice to process such a broken strings before passing them to PostgreSQL? Iconv from utf-8 to utf-8 dropping bad characters? This rings of GIGO... if your user enters garbage, how do you know what they wanted? You

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Phoenix Kiula
On 15/08/07, Ivan Zolotukhin [EMAIL PROTECTED] wrote: Hello, Actually I tried smth like $str = @iconv(UTF-8, UTF-8//IGNORE, $str); when preparing string for SQL query and it worked. There's probably a better way in PHP to achieve this: simply change default values in php.ini for these

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Scott Marlowe
On 8/15/07, Phoenix Kiula [EMAIL PROTECTED] wrote: What, exactly, does that mean? That PostgreSQL should take things in invalid utf-8 format and just store them? Or that PostgreSQL should autoconvert from invalid utf-8 to valid utf-8, guessing the proper codes? Seriously, what do

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Ben
On Thu, 16 Aug 2007, Phoenix Kiula wrote: I am not advocating what others should do. But I know what I need my DB to do. If I want it to store data that does not match puritanical standards of textual storage, then it should allow me to... It does allow that: store it as a BLOB, and then

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Phoenix Kiula
What, exactly, does that mean? That PostgreSQL should take things in invalid utf-8 format and just store them? Or that PostgreSQL should autoconvert from invalid utf-8 to valid utf-8, guessing the proper codes? Seriously, what do you want pgsql to do with these invalid inputs? PG should

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Scott Marlowe
On 8/15/07, Phoenix Kiula [EMAIL PROTECTED] wrote: On 15/08/07, Ivan Zolotukhin [EMAIL PROTECTED] wrote: Hello, Actually I tried smth like $str = @iconv(UTF-8, UTF-8//IGNORE, $str); when preparing string for SQL query and it worked. There's probably a better way in PHP to achieve this:

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Ben
On Thu, 16 Aug 2007, Phoenix Kiula wrote: 1. Even if it were bytea, would it work with regular SQL operators such as regexp and LIKE? 2. Would tsearch2 work with bytea in the future as long as the stuff in it was text? As far as I know, regexp, [i]like, tsearch2, etc. all require valid text

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Phoenix Kiula
On 16/08/07, Phoenix Kiula [EMAIL PROTECTED] wrote: On 16/08/07, Ben [EMAIL PROTECTED] wrote: On Thu, 16 Aug 2007, Phoenix Kiula wrote: I am not advocating what others should do. But I know what I need my DB to do. If I want it to store data that does not match puritanical standards

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Phoenix Kiula
On 16/08/07, Ben [EMAIL PROTECTED] wrote: On Thu, 16 Aug 2007, Phoenix Kiula wrote: I am not advocating what others should do. But I know what I need my DB to do. If I want it to store data that does not match puritanical standards of textual storage, then it should allow me to... It

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding UTF8

2007-08-15 Thread Martijn van Oosterhout
On Thu, Aug 16, 2007 at 01:56:52AM +0800, Phoenix Kiula wrote: This is very useful, thanks. This would be bytea? Quick questions: 1. Even if it were bytea, would it work with regular SQL operators such as regexp and LIKE? bytea is specifically designed for binary data, as such it has all