Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Martijn van Oosterhout
On Thu, Aug 16, 2007 at 01:56:52AM +0800, Phoenix Kiula wrote: > This is very useful, thanks. This would be "bytea"? Quick questions: > > 1. Even if it were bytea, would it work with regular SQL operators > such as regexp and LIKE? bytea is specifically designed for binary data, as such it has al

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Ben
On Thu, 16 Aug 2007, Phoenix Kiula wrote: 1. Even if it were bytea, would it work with regular SQL operators such as regexp and LIKE? 2. Would tsearch2 work with bytea in the future as long as the stuff in it was text? As far as I know, regexp, [i]like, tsearch2, etc. all require valid text

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Phoenix Kiula
On 16/08/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote: > On 16/08/07, Ben <[EMAIL PROTECTED]> wrote: > > On Thu, 16 Aug 2007, Phoenix Kiula wrote: > > > > > I am not advocating what others should do. But I know what I need my > > > DB to do. If I want it to store data that does not match puritanical

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Phoenix Kiula
On 16/08/07, Ben <[EMAIL PROTECTED]> wrote: > On Thu, 16 Aug 2007, Phoenix Kiula wrote: > > > I am not advocating what others should do. But I know what I need my > > DB to do. If I want it to store data that does not match puritanical > > standards of textual storage, then it should allow me to...

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Ben
On Thu, 16 Aug 2007, Phoenix Kiula wrote: I am not advocating what others should do. But I know what I need my DB to do. If I want it to store data that does not match puritanical standards of textual storage, then it should allow me to... It does allow that: store it as a BLOB, and then treat

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Scott Marlowe
On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote: > > What, exactly, does that mean? > > > > That PostgreSQL should take things in invalid utf-8 format and just store > > them? > > Or that PostgreSQL should autoconvert from invalid utf-8 to valid > > utf-8, guessing the proper codes? > > > > Se

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Phoenix Kiula
> What, exactly, does that mean? > > That PostgreSQL should take things in invalid utf-8 format and just store > them? > Or that PostgreSQL should autoconvert from invalid utf-8 to valid > utf-8, guessing the proper codes? > > Seriously, what do you want pgsql to do with these invalid inputs? PG

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Scott Marlowe
On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote: > On 15/08/07, Ivan Zolotukhin <[EMAIL PROTECTED]> wrote: > > Hello, > > > > Actually I tried smth like $str = @iconv("UTF-8", "UTF-8//IGNORE", > > $str); when preparing string for SQL query and it worked. There's > > probably a better way in PHP

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Phoenix Kiula
On 15/08/07, Ivan Zolotukhin <[EMAIL PROTECTED]> wrote: > Hello, > > Actually I tried smth like $str = @iconv("UTF-8", "UTF-8//IGNORE", > $str); when preparing string for SQL query and it worked. There's > probably a better way in PHP to achieve this: simply change default > values in php.ini for t

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Vivek Khera
On Aug 15, 2007, at 7:41 AM, Ivan Zolotukhin wrote: What is the best practice to process such a broken strings before passing them to PostgreSQL? Iconv from utf-8 to utf-8 dropping bad characters? This rings of GIGO... if your user enters garbage, how do you know what they wanted? You don'

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Ivan Zolotukhin
Hello, Actually I tried smth like $str = @iconv("UTF-8", "UTF-8//IGNORE", $str); when preparing string for SQL query and it worked. There's probably a better way in PHP to achieve this: simply change default values in php.ini for these parameters: mbstring.encoding_translation = On mbstring.subst

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Ivan Zolotukhin
Hello, Well, PostgreSQL is correct entirely, I would post this message to the -hackers list otherwise :) The question was rather about application processing of user input not about change of database reaction on broken UTF-8 string. But I am 100% sure one should fix the input in this case since w

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Phoenix Kiula
On 15/08/07, Ivan Zolotukhin <[EMAIL PROTECTED]> wrote: > Hello, > > Imagine a web application that process text search queries from > clients. If one types a text search query in a browser it then sends > proper UTF-8 characters and application after all needed processing > (escaping, checks, etc)

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Martijn van Oosterhout
On Wed, Aug 15, 2007 at 03:41:30PM +0400, Ivan Zolotukhin wrote: > Hello, > > Imagine a web application that process text search queries from > clients. If one types a text search query in a browser it then sends > proper UTF-8 characters and application after all needed processing > (escaping, ch

[GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Ivan Zolotukhin
Hello, Imagine a web application that process text search queries from clients. If one types a text search query in a browser it then sends proper UTF-8 characters and application after all needed processing (escaping, checks, etc) passes it to database. But if one modifies URL of the query adding