Re: [Slony1-general] strategy to fix utf8 encoding errors

John Sidney-Woollett Mon, 29 May 2006 23:12:08 -0700

Can you provide the regex that identifies invalid UTF-8 strings when the 
data is only expected to contain the standard ASCII and latin (ISO-8859) 
character set characters)?


I need to check that we won't have problems before we migrate to 8.1.x 
using Slony...

Thanks

john

Ian Burrell wrote:
> When migrated from 7.4 to 8.1, we had problems with bad characters.
> There was a small set of bad characters, usually characters which
> hadn't been translated to UTF-8 but were in the original latin-1 or
> windows-1252 character set.
> 
> Luckily, UTF-8 strings are pretty distinctive.  It is pretty easy to
> write a regex which only matches valid UTF-8 strings.  You could
> either run that against a dump, every column  in eveyr table, or
> particular problem columns.  If you have a good idea of what the
> original character set was and what characters you can expect, then
> you can translate them to Unicode.
> 
>  - Ian
> _______________________________________________
> Slony1-general mailing list
> [email protected]
> http://gborg.postgresql.org/mailman/listinfo/slony1-general
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Re: [Slony1-general] strategy to fix utf8 encoding errors

Reply via email to