Re: [PHP-DEV] default charset confusion

2012-04-01 Thread Daniel Convissor
Hi Folks: This topic appears to have been quietly tabled. I didn't notice a decision here or a commit. On Mon, Mar 12, 2012 at 01:12:03PM -0700, Rasmus Lerdorf wrote: So maybe a way to tackle this is to use the mbstring internal encoding when it is set as the htmlspecialchars default when

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread jpauli
On Tue, Mar 13, 2012 at 1:52 AM, Yasuo Ohgaki yohg...@ohgaki.net wrote: 2012/3/13 Rasmus Lerdorf ras...@lerdorf.com: On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote: I thought default_charset became UTF-8, so I was expecting following HTTP header. content-type text/html; charset=UTF-8

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread Michael Stowe
Correct me if I'm wrong, but I believe Zend Multibyte is now enabled by default in PHP 5.4. - Mike On Wed, Mar 14, 2012 at 9:24 AM, Ferenc Kovacs tyr...@gmail.com wrote: I would then propose to make mbstring compile time mandatory. I'm against yet another global ini setting, I find

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread Ferenc Kovacs
On Wed, Mar 14, 2012 at 3:29 PM, Michael Stowe m...@mikestowe.com wrote: Correct me if I'm wrong, but I believe Zend Multibyte is now enabled by default in PHP 5.4. - Mike http://lxr.php.net/opengrok/xref/PHP_5_4/UPGRADING#91 http://lxr.php.net/opengrok/xref/PHP_5_4/Zend/zend.c#108

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread Gustavo Lopes
On Wed, 14 Mar 2012 14:55:17 +0100, jpauli jpa...@php.net wrote: I would then propose to make mbstring compile time mandatory. I'm completely against these kind of lazy solutions. Yes, let's add strong coupling (already starting to smell) to one of the largest extensions and make it

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread jpauli
On Wed, Mar 14, 2012 at 3:37 PM, Gustavo Lopes glo...@nebm.ist.utl.ptwrote: On Wed, 14 Mar 2012 14:55:17 +0100, jpauli jpa...@php.net wrote: I would then propose to make mbstring compile time mandatory. I'm completely against these kind of lazy solutions. Yes, let's add strong coupling

Re: [PHP-DEV] default charset confusion

2012-03-14 Thread Ángel González
On 13/03/12 00:25, Stas Malyshev wrote: Hi! Still, that API is likely wrong: a library function written by someone completely unrelated to the main application shouldn't be echoing anything through the output. And if it's not generating the html, the htmlspecialchars is better done from the

Re: [PHP-DEV] default charset confusion

2012-03-13 Thread Christian Schneider
Am 13.03.2012, 02:34 Uhr, schrieb Rasmus Lerdorf ras...@lerdorf.com: On 03/12/2012 05:52 PM, Yasuo Ohgaki wrote: I always set all parameters for htmlentities/htmlspecialchars, therefore I haven't noticed this was changed from 5.3. They may be migrating from 5.2 or older. (RHEL5 uses 5.1) No,

Re: [PHP-DEV] default charset confusion

2012-03-13 Thread Richard Lynch
On Mon, March 12, 2012 2:44 pm, Rasmus Lerdorf wrote: But you can't necessarily hardcode the encoding if you are writing portable code. That's a bit like hardcoding a timezone. In order to write portable code you need to give people the ability to localize it. If you wanted it portable,

Re: [PHP-DEV] default charset confusion

2012-03-13 Thread Tomas Kuliavas
2012.03.13 16:38 Richard Lynch rašė: I'd have to agree with Stas that everybody should start passing in a variable there, that can be set somewhere in a config, or, perhaps, would DEFAULT to, e... You do realize that suggestions on this thread and original bug reporter failed to make

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Laruence
On Mon, Mar 12, 2012 at 2:49 PM, Rasmus Lerdorf ras...@lerdorf.com wrote: I caused this situation myself by not explicitly differentiating between the default charset for the internal htmlspecialchars() and htmlentities() functions and the output charset directive ini directive

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Adam Jon Richardson
On Mon, Mar 12, 2012 at 2:49 AM, Rasmus Lerdorf ras...@lerdorf.com wrote: What we really need is what we added in PHP 6. A runtime encoding ini setting that is distinct from the output charset which we can use here. That would allow people to fix all their legacy code to a specific runtime

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev
Hi! What we really need is what we added in PHP 6. A runtime encoding ini setting that is distinct from the output charset which we can use here. That would allow people to fix all their legacy code to a specific runtime encoding with a single ini setting instead of changing thousands of lines

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Laruence
On Mon, Mar 12, 2012 at 3:10 PM, Stas Malyshev smalys...@sugarcrm.com wrote: Hi! What we really need is what we added in PHP 6. A runtime encoding ini setting that is distinct from the output charset which we can use here. That would allow people to fix all their legacy code to a specific

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Laruence
On Mon, Mar 12, 2012 at 3:10 PM, Stas Malyshev smalys...@sugarcrm.com wrote: Hi! What we really need is what we added in PHP 6. A runtime encoding ini setting that is distinct from the output charset which we can use here. That would allow people to fix all their legacy code to a specific

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:10 AM, Stas Malyshev wrote: Hi! What we really need is what we added in PHP 6. A runtime encoding ini setting that is distinct from the output charset which we can use here. That would allow people to fix all their legacy code to a specific runtime encoding with a single

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:41 AM, Rasmus Lerdorf wrote: $string = $string = prep$gb2312/p/pre; Sorry typo there obviously. Just one $string -Rasmus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev
Hi! Ignoring 5.4 for a second, if you in 5.3 do this: echo htmlspecialchars($string); echo htmlspecialchars($string, NULL, ISO-8859-1); echo htmlspecialchars($string, NULL, UTF-8); You will see that the first two output the escaped string with the GB2312 bytes intact within it and the UTF-8

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Adam Jon Richardson
On Mon, Mar 12, 2012 at 3:52 AM, Stas Malyshev smalys...@sugarcrm.comwrote: Hi! Ignoring 5.4 for a second, if you in 5.3 do this: echo htmlspecialchars($string); echo htmlspecialchars($string, NULL, ISO-8859-1); echo htmlspecialchars($string, NULL, UTF-8); You will see that the first

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:52 AM, Stas Malyshev wrote: Hi! Ignoring 5.4 for a second, if you in 5.3 do this: echo htmlspecialchars($string); echo htmlspecialchars($string, NULL, ISO-8859-1); echo htmlspecialchars($string, NULL, UTF-8); You will see that the first two output the escaped string with

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Yasuo Ohgaki
Hi I think following PHP 5.4.0 NEWS entry is misleading. . Changed default value of default_charset php.ini option from ISO-8859-1 to UTF-8. (Rasmus) I thought default_charset became UTF-8, so I was expecting following HTTP header. content-typetext/html; charset=UTF-8 However, I got

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Yasuo Ohgaki
Hi, I think motivation of /* Default is now UTF-8 */ if (charset_hint == NULL) return cs_utf_8; is for better performance and I think it's good for better performance. Alternative of my suggestion is introduce new php.ini entry as Rusmus mentioned. The name may be

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Laruence
On Mon, Mar 12, 2012 at 6:21 PM, Yasuo Ohgaki yohg...@ohgaki.net wrote: Hi, I think motivation of       /* Default is now UTF-8 */       if (charset_hint == NULL)               return cs_utf_8; is for better performance and I think it's good for better performance. Alternative of my

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote: Hi I think following PHP 5.4.0 NEWS entry is misleading. . Changed default value of default_charset php.ini option from ISO-8859-1 to UTF-8. (Rasmus) Yes, I have fixed that now. I thought default_charset became UTF-8, so I was

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Michael Stowe
I think the ini directive, while adding another to the list, may be the most unobtrusive method to address this issue, at least for developers. I definitely agree with Rasmus that this could be one of the bigger headaches in transitioning to 5.4 (for non-UTF8 sites) and unless we can come up with

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Richard Lynch
On Mon, March 12, 2012 1:49 am, Rasmus Lerdorf wrote: What we really need is what we added in PHP 6. A runtime encoding ini setting that is distinct from the output charset which we can use here. The usual argument against another php.ini setting, other than too many already is the difficulty

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:40 PM, Stas Malyshev wrote: Hi! And yes, it may very well be dangerous to use the wrong charset and now that we have better support for GB2312 and other asian charsets in the entities functions in 5.4 it is even more prudent to choose the right one so we should provide some

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev
Hi! But you can't necessarily hardcode the encoding if you are writing portable code. That's a bit like hardcoding a timezone. In order to write portable code you need to give people the ability to localize it. No, it's not like timezone at all. I have to support all timezones in a global

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:51 PM, Stas Malyshev wrote: Hi! But you can't necessarily hardcode the encoding if you are writing portable code. That's a bit like hardcoding a timezone. In order to write portable code you need to give people the ability to localize it. No, it's not like timezone at

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Pierre Joye
hi Rasmus, On Mon, Mar 12, 2012 at 9:12 PM, Rasmus Lerdorf ras...@lerdorf.com wrote: If everything was UTF-8 we wouldn't have any of these issues. Unfortunately that isn't the case. The question is what to do with apps that need to deal with non UTF-8 data. Are we going to provide any help

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Ángel González
On 12/03/12 20:51, Stas Malyshev wrote: Hi! But you can't necessarily hardcode the encoding if you are writing portable code. That's a bit like hardcoding a timezone. In order to write portable code you need to give people the ability to localize it. No, it's not like timezone at all. I

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev
Hi! If you are a framework developer, and really want to shield against a bad php.ini setting, you could ini_set() to your prefered charset at the beginning of the request. That assuming the request is completely processed by your framework and you never call any outside code and any

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Ángel González
On 12/03/12 22:30, Stas Malyshev wrote: Hi! If you are a framework developer, and really want to shield against a bad php.ini setting, you could ini_set() to your prefered charset at the beginning of the request. That assuming the request is completely processed by your framework and you

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev
Hi! Still, that API is likely wrong: a library function written by someone completely unrelated to the main application shouldn't be echoing anything through the output. And if it's not generating the html, the htmlspecialchars is better done from the return at the calling application (probably

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Yasuo Ohgaki
2012/3/13 Rasmus Lerdorf ras...@lerdorf.com: On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote: I thought default_charset became UTF-8, so I was expecting following HTTP header. content-type  text/html; charset=UTF-8 However, I got empty charset (missing 'charset=UTF-8'). So I looked up to source

Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 05:52 PM, Yasuo Ohgaki wrote: I always set all parameters for htmlentities/htmlspecialchars, therefore I haven't noticed this was changed from 5.3. They may be migrating from 5.2 or older. (RHEL5 uses 5.1) No, like I showed, moving from 5.3 to 5.4 breaks because the new default