On 5/27/06, Vivek Khera <[EMAIL PROTECTED]> wrote: > I have a database (rt3) in a postgres 8.0 server which has UNICODE > encoding. It was replicated to another 8.0 DB just fine for a long > time. Today I upgraded the replica to 8.1 and when I went to > replicate it, I got UTF8 encoding failure from one of the tables: > 'invalid byte sequence for encoding "UTF8": 0xa9' > > Aside from playing whack-a-mole and fixing the errors one at a time > as they are reported by slon, what can I do to make the data UTF8 > safe for the strict checking of Pg 8.1? > > And what does one do to figure out what character to replace or do > you generally just cut the offending character from the row? >
When migrated from 7.4 to 8.1, we had problems with bad characters. There was a small set of bad characters, usually characters which hadn't been translated to UTF-8 but were in the original latin-1 or windows-1252 character set. Luckily, UTF-8 strings are pretty distinctive. It is pretty easy to write a regex which only matches valid UTF-8 strings. You could either run that against a dump, every column in eveyr table, or particular problem columns. If you have a good idea of what the original character set was and what characters you can expect, then you can translate them to Unicode. - Ian _______________________________________________ Slony1-general mailing list [email protected] http://gborg.postgresql.org/mailman/listinfo/slony1-general
