Summary: I have a MySQL database with data in an unknown character set,
or mixture thereof (I thought it was Unicode, but it's not). It displays
correctly when used with MySQL commandline tools under certain
configurations, but I need to figure out how to convert it to proper
Unicode. After
Hello Jesse
Please try the following using the table 'foo' you described earlier:
mysql set names utf8;
mysql insert into foo (author) values('Pérez-Reverte, Arturo Кири́ллица
ქართული 汉字 / 漢');
then try this script, notice the attributes which in this case are very
important:
use strict;
Hello Jesse
I'm pretty sure your data has been UTF-8 encoded twice. Consider this example:
use strict;
use warnings;
use Encode;
# $string is UTF-8, but Perl doesn't know
my $string = 'Pérez-Reverte, Arturo Кири́ллица ქართული 汉字 / 漢';
# $double_utf8 contains the double UTF-8 encoded string
#
On Mon, Jul 05, 2010 at 05:45:11PM +0200, Matias E. Fernandez wrote:
Hello Jesse
Please try the following using the table 'foo' you described earlier:
mysql set names utf8;
mysql insert into foo (author) values('Pérez-Reverte, Arturo Кири́ллица
ქართული 汉字 / 漢');
[my mailer is still
On Mon, Jul 05, 2010 at 05:49:30PM -0400, Jesse Sheidlower wrote:
On Mon, Jul 05, 2010 at 05:45:11PM +0200, Matias E. Fernandez wrote:
Hello Jesse
Please try the following using the table 'foo' you described earlier:
mysql set names utf8;
mysql insert into foo (author)
On Mon, Jul 05, 2010 at 11:02:02PM +0200, Matias E. Fernandez wrote:
Hello Jesse
I'm pretty sure your data has been UTF-8 encoded twice. Consider this example:
use strict;
use warnings;
use Encode;
# $string is UTF-8, but Perl doesn't know
my $string = 'Pérez-Reverte, Arturo
Hello Jesse
On 2010-07-05, at 23:56, Jesse Sheidlower wrote:
Sorry, let me revise that slightly: I do get the correct
results, but preceded by Wide character in print at
foo-test2.pl line 22.
That's perfectly okay, please read perluniintro[1], perlunifaq[2] and the like!
If you are printing
Hello Jesse
Right, that looks correct. But this is latin1, not UTF-8,
so...
No, I think I lost you half way, look at the example carefully:
First you have character data encoded as UTF-8 (my $string).
You then run that already UTF-8 encoded character data through
an ISO-8859-1 to UTF-8