> > And by '?' do you mean '?' or '�' ... there is a big difference. > What is the difference? I have seen both?
'?' is typically caused by storing the data in Mysql as UTF-8, and then a magical undocumented feature in the php mysql drivers auto converts it to 8859-1. Thus if you have a character that is not in 8859-1 (e.g. a korean character) mysql will convert it to a '?'. A '�' is generated by the browser when the browser is trying to render as UTF-8 and it comes across an invalid byte sequence. The first thing to understand about character encoding is the overlap between UTF-8 and 8859-1. Below is a sample a - lower case a (Same in 8859-1 & UTF-8) à - a acute (Available in 8859-1 & UTF8 but different values..) 賜 - Chinese character (Not in 8859-1, in UTF-8) These days, you should really do everything in UTF-8. There was a lot of talk about PHP not being UTF-8 safe, but it is largely nonsense and primarily because developers don't think about other languages. I personally never see the need for functions like substr, and I don't use regular expressions like [a-zA-Z0-9]. One other piece of advice, is don't ever use that stupid meta tag to specify the content encoding. It makes no sense to specify the encoding of the content in the content itself. The content encoding should be specified in the header and only in the header. Regards, John Campbell
_______________________________________________ New York PHP Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk NYPHPCon 2006 Presentations Online http://www.nyphpcon.com Show Your Participation in New York PHP http://www.nyphp.org/show_participation.php
