RE: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Tomas Kuliavas
>>> > >>> > Recent versions of PHP5, has a binary string introducer. >>> > >>> > echo strlen(b"\xC4\x85"); >>> >>> I have already said to Stefan. It is not an option. I need backwards >>> compatibility. If older PHP versions fail with E_PARSE errors, I >>> can't use >>> it. >> >> [EMAIL PROTECTED]:

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Brian Moon
Richard Lynch wrote: Get the world to update to 5.2, and all is good... That seems unlikely to happen in the immediate future, afaics, no matter how much we might wish it did. This is true. RHEL 5 just came out in March with PHP 5.1.6. They appear to be on a 3 year release cycle. Just try

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Rasmus Lerdorf
We at one time discussed some sort of default fallback encoding as far as I recall, and if that isn't defined, the default fallback would be the runtime encoding. I think it would ease migration headaches quite a bit I think if we could make some informed guesses on some of these functions that as

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Andrei Zmievski
urlencode() requires binary because at the time we were discussing it, we could not see what encoding to apply by default, since neither runtime encoding nor output encoding quite fits. Perhaps we could do something like another parameter that specifies the encoding to use, but we won't be

RE: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Richard Lynch
On Tue, May 22, 2007 1:12 pm, Derick Rethans wrote: > On Tue, 22 May 2007, Tomas Kuliavas wrote: > >> > >> > Recent versions of PHP5, has a binary string introducer. >> > >> > echo strlen(b"\xC4\x85"); >> >> I have already said to Stefan. It is not an option. I need backwards >> compatibility. If

RE: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Uwe Schindler
> > What I am thinking is, if unicode_semantics=on, every single time I need > > to call urlencode (or other binary-only functions) with a variable, I > > need to typecast it. Well, if this is necessary 100% of the times, why > > not do this already inside urlencode, and if the string contains bad

RE: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Uwe Schindler
> I jst compare urlencode/urldecode with Java that is from its nature using > Unicode. > > http://java.sun.com/javase/6/docs/api/java/net/URLEncoder.html > > The input parameter is a *String* (which is per definition Unicode in > Java). The output is (a ASCII-only) string with the URL-encoded val

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Rasmus Lerdorf
Rangel Reale wrote: > I am testing my 25000 lines PHP5 application on PHP6, just to see what > changes it would require. I changed exactly 5 (yes only 5) lines of > code, and it worked perfectly except for the functions that requires > binary string parameters with a (binary) typecast. > > What I

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Rangel Reale
Unicode support is a rather major milestone for PHP. It is one of the biggest changes to the codebase ever. I don't think anybody can seriously argue that decent Unicode support isn't worth the effort and it is somewhat unrealistic to think that this can be done while keeping it perfectly backwa

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Rasmus Lerdorf
Tomas Kuliavas wrote: >>> Hoepfully this project will learn something with the previuos >>> experiences ( PHP5 adoption anyone? ) and think in a reasoanble >>> backward compatibility policy. >> This is a different story: From what I'm reading Unicode support is for >> many people way more interesti

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Tomas Kuliavas
>> Hoepfully this project will learn something with the previuos >> experiences ( PHP5 adoption anyone? ) and think in a reasoanble >> backward compatibility policy. > > This is a different story: From what I'm reading Unicode support is for > many people way more interesting than many things intro

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Johannes Schlüter
Hi Christian, On Thu, 2007-05-24 at 05:19 -0400, Cristian Rodriguez wrote: > 2007/5/22, Andrei Zmievski <[EMAIL PROTECTED]>: > > Nobody is breaking your code. > > Yes, it does break code, unicode.semantics is ZEND_INI_SYSTEM, hence > I cannot even turn it off with htaccess. I think it was once

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-24 Thread Cristian Rodriguez
2007/5/22, Andrei Zmievski <[EMAIL PROTECTED]>: Nobody is breaking your code. Yes, it does break code, unicode.semantics is ZEND_INI_SYSTEM, hence I cannot even turn it off with htaccess. You are free to use unicode.semantics or turn it off. No, I cant. redistributable applications (most o

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-23 Thread Tomas Kuliavas
>> No problem. I just can't use it. It does not pass even basic "must >> work on >> Debian Stable" test. Option creates parsing errors in older PHP >> versions >> and I can't wrap it with PHP6 check. Such code must be stored in >> separate >> libraries loaded only in PHP6 and issue affects way to m

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-23 Thread Rangel Reale
some #ifdef, but this is php). []s Rangel - Original Message - From: "Andrei Zmievski" <[EMAIL PROTECTED]> To: "Rangel Reale" <[EMAIL PROTECTED]> Cc: Sent: Wednesday, May 23, 2007 4:50 PM Subject: Re: [PHP-DEV] PHP Unicode extension in PHP6 It would

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-23 Thread Andrei Zmievski
No problem. I just can't use it. It does not pass even basic "must work on Debian Stable" test. Option creates parsing errors in older PHP versions and I can't wrap it with PHP6 check. Such code must be stored in separate libraries loaded only in PHP6 and issue affects way to many functions.

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-23 Thread Andrei Zmievski
hannes Schlüter" <[EMAIL PROTECTED]> To: "Rangel Reale" <[EMAIL PROTECTED]> Cc: Sent: Tuesday, May 22, 2007 10:12 PM Subject: Re: [PHP-DEV] PHP Unicode extension in PHP6 Hi Rangel, for PHP 6 the basic string type ist "unicode string" and most functions wi

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-23 Thread Rangel Reale
re it really isn't needed... is hard for me to undestand the motives. - Original Message - From: "Johannes Schlüter" <[EMAIL PROTECTED]> To: "Rangel Reale" <[EMAIL PROTECTED]> Cc: Sent: Tuesday, May 22, 2007 10:12 PM Subject: Re: [PHP-DEV] PHP Unicode

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-23 Thread Richard Quadling
Ah! So with this article in php|Architect and PHP6, we should be able to see how things work! Looking forward to it. On 23/05/07, Alexey Zakhlestin <[EMAIL PROTECTED]> wrote: On 5/23/07, Richard Quadling <[EMAIL PROTECTED]> wrote: > > Is PHP6 in a state able to be used? For Windows XP that is?

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-23 Thread Alexey Zakhlestin
On 5/23/07, Richard Quadling <[EMAIL PROTECTED]> wrote: Is PHP6 in a state able to be used? For Windows XP that is? in a "state to be tested" would be more correct :) -- Alexey Zakhlestin http://blog.milkfarmsoft.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, vi

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-23 Thread Richard Quadling
On 23/05/07, Steph Fox <[EMAIL PROTECTED]> wrote: Nice article in the May edition of php|arch. Which _might_ make it online today if we're lucky..! Excellent! As a subscriber I'll be reading it avidly. Is PHP6 in a state able to be used? For Windows XP that is? -- - Richard Quadling Zend

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-23 Thread Richard Quadling
On 23/05/07, Johannes Schlüter <[EMAIL PROTECTED]> wrote: Hi Rangel, for PHP 6 the basic string type ist "unicode string" and most functions will accept these as primary type. But there are a few exceptions where unicode, for different reason, makes no sense. There you have to pass a binary stri

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-22 Thread Johannes Schlüter
> gives me a warning. > > In the future will 100% of the functions accept an unicode string? If so, > then to me this should not be a problem. > > - Original Message - > From: "Tomas Kuliavas" <[EMAIL PROTECTED]> > To: "Jared Williams" &l

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-22 Thread Tomas Kuliavas
>>> works fine here... (with php 5.2) >> >> http://www.php.net/ChangeLog-5.php#5.2.1 >> >> - Added forward support for 'b' prefix in front of string literals. >> (Andrei) >> >> >> PHP 5.2.0 >> Parse error: syntax error, unexpected T_CONSTANT_ENCAPSED_STRING in >> test.php on line 5 >> >> PHP 4.1.2

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-22 Thread Rangel Reale
For my purposes, this is ok. Will it be added to 4.x too? - Original Message - From: "Andrei Zmievski" <[EMAIL PROTECTED]> To: "Tomas Kuliavas" <[EMAIL PROTECTED]> Cc: "Jared Williams" <[EMAIL PROTECTED]>; Sent: Tuesday, May 22,

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-22 Thread Cristian Rodriguez
2007/5/22, Andrei Zmievski <[EMAIL PROTECTED]>: Right, it's available in 5.2.1. What's the problem? the problem is exactly that, it is backward incompatible, and PHP 5 is available in less than 20% of the hosts out there.. you can imagine that hosts running 5.2.1 are practically unexistant.

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-22 Thread Andrei Zmievski
works fine here... (with php 5.2) http://www.php.net/ChangeLog-5.php#5.2.1 - Added forward support for 'b' prefix in front of string literals. (Andrei) PHP 5.2.0 Parse error: syntax error, unexpected T_CONSTANT_ENCAPSED_STRING in test.php on line 5 PHP 4.1.2 Parse error: parse error, ex

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-22 Thread Andrei Zmievski
The 'b' prefix is available in PHP 5.2, exactly for this purpose. It just doesn't do anything. -Andrei On May 22, 2007, at 10:52 AM, Tomas Kuliavas wrote: Recent versions of PHP5, has a binary string introducer. echo strlen(b"\xC4\x85"); I have already said to Stefan. It is not an optio

RE: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-22 Thread Tomas Kuliavas
>> > >> > Recent versions of PHP5, has a binary string introducer. >> > >> > echo strlen(b"\xC4\x85"); >> >> I have already said to Stefan. It is not an option. I need backwards >> compatibility. If older PHP versions fail with E_PARSE errors, I can't >> use it. > > [EMAIL PROTECTED]:~$ php -r 'ech

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-22 Thread Rangel Reale
functions accept an unicode string? If so, then to me this should not be a problem. - Original Message - From: "Tomas Kuliavas" <[EMAIL PROTECTED]> To: "Jared Williams" <[EMAIL PROTECTED]> Cc: Sent: Tuesday, May 22, 2007 2:52 PM Subject: RE: [PHP-DEV] PH

RE: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-22 Thread Derick Rethans
On Tue, 22 May 2007, Tomas Kuliavas wrote: > > > > Recent versions of PHP5, has a binary string introducer. > > > > echo strlen(b"\xC4\x85"); > > I have already said to Stefan. It is not an option. I need backwards > compatibility. If older PHP versions fail with E_PARSE errors, I can't use > it.

RE: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-22 Thread Tomas Kuliavas
> > Recent versions of PHP5, has a binary string introducer. > > echo strlen(b"\xC4\x85"); I have already said to Stefan. It is not an option. I need backwards compatibility. If older PHP versions fail with E_PARSE errors, I can't use it. I can't maintain two different library versions, because i

RE: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-22 Thread Jared Williams
> -Original Message- > From: Tomas Kuliavas [mailto:[EMAIL PROTECTED] > Sent: 21 May 2007 19:26 > To: Andrei Zmievski > Cc: internals@lists.php.net > Subject: Re: [PHP-DEV] PHP Unicode extension in PHP6 > > >> 0xC4 and 0x85 are hex codes for latin sm

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-21 Thread Andrei Zmievski
They are not documented and I am testing configurations that might break scripts. If I test things and want to make code portable, configuration is not supposed to be rational. I can set option with ini_set(), if I understand what option does and it fixes the issue. http://www.php.net/unicode

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-21 Thread Richard Lynch
On Mon, May 21, 2007 9:57 am, Stefan Walk wrote: > On 21/05/07, Tomas Kuliavas <[EMAIL PROTECTED]> wrote: >> Latin capital letter A with diaeresis is 00C4. Not C4. > > Pay attention in maths, leading zeroes don't change a number. Pay attention to documentation. Leading zeros change a number. O

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-21 Thread Tomas Kuliavas
>> 0xC4 and 0x85 are hex codes for latin small letter a with ogonek in >> utf-8. ą >> >> > var_dump("ą" == "\xC4\x85"); >> echo "ą\n"; >> echo "\xC4\x85"; >> ?> >> >> If script is written in utf-8, I expect bool(true) on var_dump() line. > > var_dump("ą" == b"\xC4\x85"); > > This will give you what

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-21 Thread Andrei Zmievski
On May 19, 2007, at 11:13 PM, Tomas Kuliavas wrote: 0xC4 and 0x85 are hex codes for latin small letter a with ogonek in utf-8. ą If script is written in utf-8, I expect bool(true) on var_dump() line. var_dump("ą" == b"\xC4\x85"); This will give you what you want, if the script is written

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-21 Thread Andrei Zmievski
This is by design. If you prefer to work with actual bytes, use binary strings or literals. In unicode strings \xC4 is actually a codepoint (UTF-16 codepoint) specifying character U+00C4. -Andrei On May 19, 2007, at 8:48 AM, Tomas Kuliavas wrote: strlen("\xC4\x85") = 2. strlen((binary)"\xC

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-21 Thread Tomas Kuliavas
>> Latin capital letter A with diaeresis is 00C4. Not C4. > > Pay attention in maths, leading zeroes don't change a number. they do, if it is not a number. '00C4' + '0085' = '00C40085' 'C4' + '85' = 'C485' '00C40085' != 'C485' -- Tomas -- PHP Internals - PHP Runtime Development Mailing Lis

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-21 Thread Stefan Walk
On 21/05/07, Tomas Kuliavas <[EMAIL PROTECTED]> wrote: Latin capital letter A with diaeresis is 00C4. Not C4. Pay attention in maths, leading zeroes don't change a number. I wrote two 8bit values. Not two 16bit ones. Interpreter tries to outsmart me and thinks that I want 00C4, when I write C

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-20 Thread Tomas Kuliavas
> Disclaimer: I don't know much about the way unicode is implemented in > php, i have only used it a bit, but i believe i can clear some things > up here. > >> 0xC4 and 0x85 are hex codes for latin small letter a with ogonek in >> utf-8. ą >> >> > var_dump("ą" == "\xC4\x85"); >> echo "ą\n"; >> echo

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-20 Thread Stefan Walk
Disclaimer: I don't know much about the way unicode is implemented in php, i have only used it a bit, but i believe i can clear some things up here. On 20/05/07, Tomas Kuliavas <[EMAIL PROTECTED]> wrote: 0xC4 and 0x85 are hex codes for latin small letter a with ogonek in utf-8. ą If script is

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-19 Thread Tomas Kuliavas
>> strlen("\xC4\x85") = 2. strlen((binary)"\xC4\x85") = 4. Not good. It is >> one character in utf-8. > > I'm afraid I don't understand you again.. 0xC4 and 0x85 are hex codes for latin small letter a with ogonek in utf-8. ą If script is written in utf-8, I expect bool(true) on var_dump() line.

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-19 Thread Antony Dovgal
On 19.05.2007 19:48, Tomas Kuliavas wrote: Try this, you'll see it's really easy: "; var_dump(strlen(($s))); var_dump(strlen((binary)$s)); ?> http://www.php.net/language.types.type-juggling#language.types.typecasting No (binary). PHP 4.1.2 = parse error in test2.php on line 5. PHP 5.2.0 = Pars

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-19 Thread Tomas Kuliavas
>> SquirrelMail scripts are designed to work with binary strings. They will >> have to deal with emails written in many different character sets. In >> some >> cases scripts must know string length in bytes and not in symbols. If >> PHP >> starts converting email body or message parts, strings won'

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-19 Thread Antony Dovgal
On 19.05.2007 17:59, Tomas Kuliavas wrote: SquirrelMail scripts are designed to work with binary strings. They will have to deal with emails written in many different character sets. In some cases scripts must know string length in bytes and not in symbols. If PHP starts converting email body or

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-19 Thread Tomas Kuliavas
>> Hi, >> >> Could you make unicode.semantics configurable at PHP_INI_ALL level? > > No. > >> Or maybe PHP6 has string functions that are not unicode aware? > > All string functions are supposed to be able to work with both Unicode and > binary strings. > Unicode is just an addition, it doesn't mea

Re: [PHP-DEV] PHP Unicode extension in PHP6

2007-05-19 Thread Antony Dovgal
On 19.05.2007 16:22, Tomas Kuliavas wrote: Hi, Could you make unicode.semantics configurable at PHP_INI_ALL level? No. Or maybe PHP6 has string functions that are not unicode aware? All string functions are supposed to be able to work with both Unicode and binary strings. Unicode is just