[Dbix-class] Unicode conversion problems

2010-07-05 Thread Jesse Sheidlower
Summary: I have a MySQL database with data in an unknown character set, or mixture thereof (I thought it was Unicode, but it's not). It displays correctly when used with MySQL commandline tools under certain configurations, but I need to figure out how to convert it to proper Unicode. After

Re: [Dbix-class] Unicode conversion problems

2010-07-05 Thread Matias E. Fernandez
Hello Jesse Please try the following using the table 'foo' you described earlier: mysql set names utf8; mysql insert into foo (author) values('Pérez-Reverte, Arturo Кири́ллица ქართული 汉字 / 漢'); then try this script, notice the attributes which in this case are very important: use strict;

Re: [Dbix-class] Unicode conversion problems

2010-07-05 Thread Matias E. Fernandez
Hello Jesse I'm pretty sure your data has been UTF-8 encoded twice. Consider this example: use strict; use warnings; use Encode; # $string is UTF-8, but Perl doesn't know my $string = 'Pérez-Reverte, Arturo Кири́ллица ქართული 汉字 / 漢'; # $double_utf8 contains the double UTF-8 encoded string #

Re: [Dbix-class] Unicode conversion problems

2010-07-05 Thread Jesse Sheidlower
On Mon, Jul 05, 2010 at 05:45:11PM +0200, Matias E. Fernandez wrote: Hello Jesse Please try the following using the table 'foo' you described earlier: mysql set names utf8; mysql insert into foo (author) values('Pérez-Reverte, Arturo Кири́ллица ქართული 汉字 / 漢'); [my mailer is still

Re: [Dbix-class] Unicode conversion problems

2010-07-05 Thread Jesse Sheidlower
On Mon, Jul 05, 2010 at 05:49:30PM -0400, Jesse Sheidlower wrote: On Mon, Jul 05, 2010 at 05:45:11PM +0200, Matias E. Fernandez wrote: Hello Jesse Please try the following using the table 'foo' you described earlier: mysql set names utf8; mysql insert into foo (author)

Re: [Dbix-class] Unicode conversion problems

2010-07-05 Thread Jesse Sheidlower
On Mon, Jul 05, 2010 at 11:02:02PM +0200, Matias E. Fernandez wrote: Hello Jesse I'm pretty sure your data has been UTF-8 encoded twice. Consider this example: use strict; use warnings; use Encode; # $string is UTF-8, but Perl doesn't know my $string = 'Pérez-Reverte, Arturo

Re: [Dbix-class] Unicode conversion problems

2010-07-05 Thread Matias E. Fernandez
Hello Jesse On 2010-07-05, at 23:56, Jesse Sheidlower wrote: Sorry, let me revise that slightly: I do get the correct results, but preceded by Wide character in print at foo-test2.pl line 22. That's perfectly okay, please read perluniintro[1], perlunifaq[2] and the like! If you are printing

Re: [Dbix-class] Unicode conversion problems

2010-07-05 Thread Matias E. Fernandez
Hello Jesse Right, that looks correct. But this is latin1, not UTF-8, so... No, I think I lost you half way, look at the example carefully: First you have character data encoded as UTF-8 (my $string). You then run that already UTF-8 encoded character data through an ISO-8859-1 to UTF-8