Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-22 Thread Sam Mason
On Wed, Jul 22, 2009 at 05:26:37PM +0800, Phoenix Kiula wrote: > I tried this. Get an error. > > mypg=# select * from interesting WHERE NOT description ~ ( '^('|| > mypg(#$$[\09\0A\0D\x20-\x7E]|$$|| -- ASCII > mypg(#$$[\xC2-\xDF][\x80-\xBF]|$$|| -- non-overlong 2-

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-22 Thread Justin Pasher
Phoenix Kiula wrote: I tried this. Get an error. mypg=# select * from interesting WHERE NOT description ~ ( '^('|| mypg(#$$[\09\0A\0D\x20-\x7E]|$$|| -- ASCII mypg(#$$[\xC2-\xDF][\x80-\xBF]|$$|| -- non-overlong 2-byte mypg(# $$\xE0[\xA0-\xBF][\x80-\xBF]|$$||

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-22 Thread Phoenix Kiula
On Tue, Jul 21, 2009 at 6:35 PM, Sam Mason wrote: > On Tue, Jul 21, 2009 at 09:37:04AM +0200, Daniel Verite wrote: >> >I'd love to fix them. But if I do a search for >> >SELECT * FROM xyz WHERE col like '%0x80%' >> > >> >it doesn't work. How should I search for these characters? >> >> In 8.2, try:

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-21 Thread Albe Laurenz
Phoenix Kiula wrote: > > I wonder: why do you spend so much time complaining instead of > > simply locating the buggy data and fixing them? > > > I'd love to fix them. But if I do a search for > > SELECT * FROM xyz WHERE col like '%0x80%' > > it doesn't work. How should I search for these chara

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-21 Thread Sam Mason
On Tue, Jul 21, 2009 at 09:37:04AM +0200, Daniel Verite wrote: > >I'd love to fix them. But if I do a search for > >SELECT * FROM xyz WHERE col like '%0x80%' > > > >it doesn't work. How should I search for these characters? > > In 8.2, try: WHERE strpos(col, E'\x80') > 0 > > Note that this may fi

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-21 Thread Daniel Verite
Phoenix Kiula wrote: I'd love to fix them. But if I do a search for SELECT * FROM xyz WHERE col like '%0x80%' it doesn't work. How should I search for these characters? In 8.2, try: WHERE strpos(col, E'\x80') > 0 Note that this may find valid data as well, because the error you get

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-20 Thread Phoenix Kiula
> "0x80" makes me think of the following: > The data originate from a Windows system, where 0x80 is a Euro > sign. Somehow these were imported into PostgreSQL without the > appropriate translation into UTF-8 (how I do not know). > > I wonder: why do you spend so much time complaining instead of > s

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-20 Thread Martijn van Oosterhout
On Mon, Jul 20, 2009 at 10:32:15AM +0800, Phoenix Kiula wrote: > Thanks Martin. I tried searching through the archives and could only > come with something like this: > > http://docs.moodle.org/en/UTF-8_PostgreSQL > > But this only has the usual iconv stuff suggested. > > Could you pls suggest s

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-20 Thread Albe Laurenz
Phoenix Kiula wrote: > Really, PG absolutely needs a way to upgrade the database without so > much data related downtime and all these silly woes. Several competing > database systems are a cinch to upgrade. I'd call it data corruption, not a silly woe. I know that Oracle for example would not ma

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-19 Thread Phoenix Kiula
On Sun, Jul 19, 2009 at 7:08 PM, Martijn van Oosterhout wrote: > On Sun, Jul 19, 2009 at 10:16:17AM +0800, Phoenix Kiula wrote: > Look through the archives, there are scripts that will scan all your > text fields for UTF-8 problems. If you run them once you can clear out > all the problems prior t

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-19 Thread Martijn van Oosterhout
On Sun, Jul 19, 2009 at 10:16:17AM +0800, Phoenix Kiula wrote: > If so, how can I check for them in my old database, which is 8.2.9? > I'm now moving first to 8.3 (then to the 84). > > Really, PG absolutely needs a way to upgrade the database without so > much data related downtime and all these s

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-18 Thread Phoenix Kiula
On Tue, Jul 14, 2009 at 9:52 PM, Alvaro Herrera wrote: > Andres Freund wrote: >> On Tuesday 14 July 2009 11:36:57 Jasen Betts wrote: > >> > if you do an ascii dump and the dump starts out "SET CLIENT ENCODING >> > 'UTF8'" or similar but you still get errors. >> Do you mean that a dump from SQL_ASCI

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-14 Thread Andres Freund
On Tuesday 14 July 2009 15:52:29 Alvaro Herrera wrote: > Andres Freund wrote: > > On Tuesday 14 July 2009 11:36:57 Jasen Betts wrote: > > > if you do an ascii dump and the dump starts out "SET CLIENT ENCODING > > > 'UTF8'" or similar but you still get errors. > > > > Do you mean that a dump from SQ

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-14 Thread Alvaro Herrera
Andres Freund wrote: > On Tuesday 14 July 2009 11:36:57 Jasen Betts wrote: > > if you do an ascii dump and the dump starts out "SET CLIENT ENCODING > > 'UTF8'" or similar but you still get errors. > Do you mean that a dump from SQL_ASCII can yield non-utf8 data? right. But > According to the OP h

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-14 Thread Andres Freund
On Tuesday 14 July 2009 11:36:57 Jasen Betts wrote: > On 2009-07-13, Andres Freund wrote: > > On Sunday 12 July 2009 13:19:50 Phoenix Kiula wrote: > >> Hi. I *always* get an error moving my current fully utf-8 database > >> data into a new DB. > >> > >> My server has the version 8.3 with a five ye

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-14 Thread Jasen Betts
On 2009-07-13, Andres Freund wrote: > On Sunday 12 July 2009 13:19:50 Phoenix Kiula wrote: >> Hi. I *always* get an error moving my current fully utf-8 database >> data into a new DB. >> >> My server has the version 8.3 with a five year old DB. Everything, all >> collation, LC_LOCALE etc are all u

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-13 Thread Andres Freund
On Sunday 12 July 2009 13:19:50 Phoenix Kiula wrote: > Hi. I *always* get an error moving my current fully utf-8 database > data into a new DB. > > My server has the version 8.3 with a five year old DB. Everything, all > collation, LC_LOCALE etc are all utf8. > > When I install a new Postgresql 8.4

Re: [GENERAL] Best practices for moving UTF8 databases

2009-07-13 Thread Albe Laurenz
Phoenix Kiula wrote: > Hi. I *always* get an error moving my current fully utf-8 database > data into a new DB. > > My server has the version 8.3 with a five year old DB. Everything, all > collation, LC_LOCALE etc are all utf8. > > When I install a new Postgresql 8.4 on my home Mac OSX machine (a

[GENERAL] Best practices for moving UTF8 databases

2009-07-12 Thread Phoenix Kiula
Hi. I *always* get an error moving my current fully utf-8 database data into a new DB. My server has the version 8.3 with a five year old DB. Everything, all collation, LC_LOCALE etc are all utf8. When I install a new Postgresql 8.4 on my home Mac OSX machine (after losing some hair) I set everyt