[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2012-01-05 Thread Benjamin Peterson
Benjamin Peterson added the comment: Closing now. -- nosy: +benjamin.peterson resolution: -> out of date status: open -> closed ___ Python tracker ___ _

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-09-29 Thread Ezio Melotti
Ezio Melotti added the comment: Py_UNICODE_NEXT has been removed from 3.3 but it's still available and used in 2.7/3.2 (even if it's private). In order to fix #10521 on 2.7/3.2 the _Py_UNICODE_PUT_NEXT macro attached to this patch is required. -- versions: +Python 3.3 -Python 3.2 __

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-09-29 Thread STINNER Victor
STINNER Victor added the comment: The PEP 393 has been accepted and merge into Python 3.3. Python 3.3 doesn't need the Py_UNICODE_NEXT macro anymore. But my macros (unicode_macros.patch) are still useful. -- versions: +Python 3.2 -Python 3.3 ___ Py

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-22 Thread Roundup Robot
Roundup Robot added the comment: New changeset 77171f993bf2 by Ezio Melotti in branch 'default': #10542: Add 4 macros to work with surrogates: Py_UNICODE_IS_SURROGATE, Py_UNICODE_IS_HIGH_SURROGATE, Py_UNICODE_IS_LOW_SURROGATE, Py_UNICODE_JOIN_SURROGATES. http://hg.python.org/cpython/rev/77171f

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-22 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Ezio Melotti wrote: > > Ezio Melotti added the comment: > > The attached patch adds the following 4 public macros to unicodeobjects.h: > Py_UNICODE_IS_SURROGATE(ch) > Py_UNICODE_IS_HIGH_SURROGATE(ch) > Py_UNICODE_IS_LOW_SURROGATE(ch) > Py_UNICODE_

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-22 Thread Ezio Melotti
Ezio Melotti added the comment: The attached patch adds the following 4 public macros to unicodeobjects.h: Py_UNICODE_IS_SURROGATE(ch) Py_UNICODE_IS_HIGH_SURROGATE(ch) Py_UNICODE_IS_LOW_SURROGATE(ch) Py_UNICODE_JOIN_SURROGATES(high, low) and documents them. Since _Py_UNICODE_NEXT is sti

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-18 Thread Ezio Melotti
Ezio Melotti added the comment: I attached a patch to fix the str.is* methods on #9200 that also includes the macro. Since they are not public there, I don't see a reason to do 2 separate commits on 2.7/3.2 (one for the feature and one for the fix). -- __

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread STINNER Victor
STINNER Victor added the comment: > OK, so in 2.7/3.2 I'll put them in unicodeobject.c It looks like #9200 only needs Py_UNICODE_NEXT, which can be implemented without the other Py_UNICODE_*SURROGATE* macros. -- ___ Python tracker

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Martin v . Löwis
Martin v. Löwis added the comment: > Also what about 3.2? Are you saying that we should fix the bug in > 3.2/3.3 only and leave 2.x alone or that you don't want the bug to be > fixed in all the bug-fix releases (i.e. 2.7/3.2)? Notice that the macros themselves don't fix any bugs. As for the bu

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Ezio Melotti
Ezio Melotti added the comment: Correct. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Eric V. Smith
Eric V. Smith added the comment: On 8/17/2011 6:30 AM, Ezio Melotti wrote: > OK, so in 2.7/3.2 I'll put them in unicodeobject.c, and in 3.3 I'll move them > in unicodeobject.c. I believe the second file should be unicodeobject.h, correct? -- ___ Py

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Ezio Melotti wrote: > > Ezio Melotti added the comment: > >> For bug fixes, you can put the macros straight into unicodeobject.c, >> but please leave unicodeobject.h untouched - otherwise people will >> mess around with these macros (even if they are priv

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Ezio Melotti
Ezio Melotti added the comment: > For bug fixes, you can put the macros straight into unicodeobject.c, > but please leave unicodeobject.h untouched - otherwise people will > mess around with these macros (even if they are private) and users > will start to wonder about linker errors if they use

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Ezio Melotti wrote: > > Ezio Melotti added the comment: > >> Ezio used two different naming schemes in his email. Please always >> use Py_UNICODE_... or _Py_UNICODE (not PyUNICODE_ or _PyUNICODE_). > > Indeed, that was a typo + copy/paste. I meant to sa

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Ezio Melotti
Ezio Melotti added the comment: > Ezio used two different naming schemes in his email. Please always > use Py_UNICODE_... or _Py_UNICODE (not PyUNICODE_ or _PyUNICODE_). Indeed, that was a typo + copy/paste. I meant to say Py_UNICODE_* and _Py_UNICODE_*. Sorry about the confusion. > Why wou

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread STINNER Victor
STINNER Victor added the comment: Ah yes, the correct prefix for functions working on Py_UNICODE characters/strings is "Py_UNICODE", not "PyUNICODE", sorry. >> For Python 2.7 and 3.2, I would prefer to not touch a public header, >> and so add the macros in unicodeobject.c. > > Is there some re

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Ezio Melotti
Ezio Melotti added the comment: > For Python 2.7 and 3.2, I would prefer to not touch a public header, > and so add the macros in unicodeobject.c. Is there some reason for this? I think it's better if we have them in the same place rather than renaming and moving them in another file between

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > Le 17/08/2011 07:04, Ezio Melotti a écrit : >> As I said in msg142175 I think the Py_UNICODE_IS{HIGH|LOW|}SURROGATE and >> Py_UNICODE_JOIN_SURROGATES can be committed without trailing _ in 3

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-17 Thread STINNER Victor
STINNER Victor added the comment: Le 17/08/2011 07:04, Ezio Melotti a écrit : > As I said in msg142175 I think the Py_UNICODE_IS{HIGH|LOW|}SURROGATE and > Py_UNICODE_JOIN_SURROGATES can be committed without trailing _ in 3.3 and > with trailing _ in 2.7/3.2. They should go in unicodeobject.h

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Ezio Melotti
Ezio Melotti added the comment: As I said in msg142175 I think the Py_UNICODE_IS{HIGH|LOW|}SURROGATE and Py_UNICODE_JOIN_SURROGATES can be committed without trailing _ in 3.3 and with trailing _ in 2.7/3.2. They should go in unicodeobject.h and be public in 3.3+. Regarding the name, it would

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread STINNER Victor
STINNER Victor added the comment: > The code review links point to something weird. That's because I posted a patch for another issue. It's the patch set 5, not the patch set 6 :-) Direct link: http://bugs.python.org/review/10542/patch/3174/9874 > My first impression is that your patch does

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: The code review links point to something weird. Victor, can you upload your patch for review? My first impression is that your patch does not accomplish much beyond replacing some literal expressions with macros. What I wanted to achieve with this is

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread STINNER Victor
STINNER Victor added the comment: (oops, msg142225 was for issue #12326) -- ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread STINNER Victor
Changes by STINNER Victor : -- Removed message: http://bugs.python.org/msg142225 ___ Python tracker ___ ___ Python-bugs-list mailing l

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread STINNER Victor
Changes by STINNER Victor : Removed file: http://bugs.python.org/file22916/linux3-v2.patch ___ Python tracker ___ ___ Python-bugs-list mailing

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread STINNER Victor
STINNER Victor added the comment: My patch version 2: don't test for a specific major version of an OS, test only its name. My patch now changes also tests for FreeBSD, NetBSD, OpenBSD, (...), and the _expectations list in regrtest.py. -- Added file: http://bugs.python.org/file22916/l

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Marc-Andre Lemburg wrote: > > Marc-Andre Lemburg added the comment: > > STINNER Victor wrote: >> >> STINNER Victor added the comment: >> >> I'm reposting my patch from #12751. I think that it's simpler than >> belopolsky's patch: it doesn't add public m

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > I'm reposting my patch from #12751. I think that it's simpler than > belopolsky's patch: it doesn't add public macros in unicodeobject.h and don't > add the complex Py_UNICODE_NEXT() macro.

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread STINNER Victor
STINNER Victor added the comment: I'm reposting my patch from #12751. I think that it's simpler than belopolsky's patch: it doesn't add public macros in unicodeobject.h and don't add the complex Py_UNICODE_NEXT() macro. My patch only adds private macros in unicodeobject.c to factorize the cod

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Tom Christiansen
Tom Christiansen added the comment: Marc-Andre Lemburg wrote on Tue, 16 Aug 2011 12:11:22 -: > The reasoning behind e.g. "ISSURROGATE" is that those names originate > from and are consistent with the already existing ISLOWER/ISUPPER/ISTITLE > macros which in return stem from the C APIs

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Tom Christiansen wrote: > So keeping your preamble bits, I might have considered doing it > this way if it were me doing it: > > #define _Py_UNICODE_IS_SURROGATE > #define _Py_UNICODE_IS_LEAD_SURROGATE > #define _Py_UNICODE_IS_TRAIL_SURROGATE >

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Tom Christiansen
Tom Christiansen added the comment: Ezio Melotti wrote on Tue, 16 Aug 2011 09:23:50 -: > All the other macros[0] follow the same convention, e.g. Py_UNICODE_ISLOWER > and Py_UNICODE_TOLOWER. I agree that keeping the words separate makes them > more readable though. > [0]: Inclu

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Tom Christiansen
Tom Christiansen added the comment: Antoine Pitrou wrote on Tue, 16 Aug 2011 09:18:46 -: >> I think the 4 macros: >> #define _Py_UNICODE_ISSURROGATE >> #define _Py_UNICODE_ISHIGHSURROGATE >> #define _Py_UNICODE_ISLOWSURROGATE >> #define _Py_UNICODE_JOIN_SURROGATES >> are quite stra

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Tom Christiansen
Tom Christiansen added the comment: I now see there are lots of good things in the BOM FAQ that have come up lately regarding surrogates and other illegal characters, and about what can go in data streams. I quote a few of these from http://unicode.org/faq/utf_bom.html below: Q: How do I

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Tom Christiansen
Tom Christiansen added the comment: >Ezio Melotti added the comment: >I think the 4 macros: > #define _Py_UNICODE_ISSURROGATE > #define _Py_UNICODE_ISHIGHSURROGATE > #define _Py_UNICODE_ISLOWSURROGATE > #define _Py_UNICODE_JOIN_SURROGATES >are quite straightforward and can avoid using the trai

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Ezio Melotti
Ezio Melotti added the comment: All the other macros[0] follow the same convention, e.g. Py_UNICODE_ISLOWER and Py_UNICODE_TOLOWER. I agree that keeping the words separate makes them more readable though. [0]: Include/unicodeobject.h:328 -- ___ Py

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I think the 4 macros: > #define _Py_UNICODE_ISSURROGATE > #define _Py_UNICODE_ISHIGHSURROGATE > #define _Py_UNICODE_ISLOWSURROGATE > #define _Py_UNICODE_JOIN_SURROGATES > are quite straightforward and can avoid using the trailing _. I don't want to bikesh

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Ezio Melotti
Ezio Melotti added the comment: I think the 4 macros: #define _Py_UNICODE_ISSURROGATE #define _Py_UNICODE_ISHIGHSURROGATE #define _Py_UNICODE_ISLOWSURROGATE #define _Py_UNICODE_JOIN_SURROGATES are quite straightforward and can avoid using the trailing _. Since I would like to see #9200 fixe

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-16 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Martin v. Löwis wrote: > > A PEP 393 draft implementation is available at > https://bitbucket.org/t0rsten/pep-393/ (branch pep-393); if this gets into > 3.3, this issue will be outdated: there won't be "narrow" builds of Python > anymore (nor will there

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-15 Thread Ezio Melotti
Ezio Melotti added the comment: That's a really good news. Some Unicode issues can still be fixed on 2.7 and 3.2 though. FWIW I was planning to look at this and #9200 in the following days and see if I can fix them. -- ___ Python tracker

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-15 Thread Martin v . Löwis
Martin v. Löwis added the comment: A PEP 393 draft implementation is available at https://bitbucket.org/t0rsten/pep-393/ (branch pep-393); if this gets into 3.3, this issue will be outdated: there won't be "narrow" builds of Python anymore (nor will there be "wide" builds). -- _

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2011-08-15 Thread Ezio Melotti
Ezio Melotti added the comment: See also #12751. -- nosy: +tchrist ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscri

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-30 Thread Georg Brandl
Georg Brandl added the comment: > I think the proposal is that fixing this minefield can wait until > Python 3.3 (or even 3.4, or later). That is what I was thinking. (Alex: You might not know that Martin was the main proponent of non-ASCII identifiers, so this assessment should have some weig

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-30 Thread Martin v . Löwis
Martin v. Löwis added the comment: > Actually, it looks like PEP 3131 and the Language Reference [1] still > disagree. The latter says: > > identifier ::= id_start id_continue* > > which should probably be > > identifier ::= xid_start xid_continue* > > instead. Interesting. XID_* is be

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Martin v . Löwis
Martin v. Löwis added the comment: > Are you serious? This sounds like a py4k idea. Can you give us a > hint on what the new representation will be? I'm thinking about an approach of a variable representation: one, two, or four bytes, depending on the widest character that appears in the stri

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Wed, Dec 29, 2010 at 9:38 PM, Alexander Belopolsky wrote: .. > Given that until recently (r87433) the PEP and the reference manual > disagreed on the definition, Actually, it looks like PEP 3131 and the Language Reference [1] still disagree. The latt

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Wed, Dec 29, 2010 at 8:02 PM, Martin v. Löwis wrote: .. > > I plan to propose a complete redesign of the representation of Unicode > strings, which may well make this entire set of changes obsolete. > Are you serious? This sounds like a py4k idea. C

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Martin v . Löwis
Martin v. Löwis added the comment: >> Seriously, it can wait 3.3. > > What exactly can wait until 3.3? The presented patch introduces no > user visible changes. It is only a stepping stone to restoring some > sanity in a way supplementary characters are treated by narrow builds. > At the mom

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Wed, Dec 29, 2010 at 3:36 PM, STINNER Victor wrote: .. > Use non-ASCII identifiers is exotic. Use non-BMP identifiers is > crazy :-) Hmm, we clearly disagree on what crosses the boundary of the mental norm. IMHO, it is crazy to require users to care

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread STINNER Victor
STINNER Victor added the comment: Le mercredi 29 décembre 2010 à 19:26 +, Alexander Belopolsky a écrit : > Would it look as exotic if presented like this? > > File "", line 1 > 𐌀 = 5 >^ > SyntaxError: invalid character in identifier > (works on a wide build) Use non-ASCII ide

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Changes by Alexander Belopolsky : Added file: http://bugs.python.org/file20190/issue10542a.diff ___ Python tracker ___ ___ Python-bugs-list ma

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: I should stop using e-mail to reply to bug reports! The mangled example was >>> 𐌀 = 5 File "", line 1 𐌀 = 5 ^ SyntaxError: invalid character in identifier -- ___ Python tracker

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Wed, Dec 29, 2010 at 11:36 AM, Georg Brandl wrote: .. > That bug already strikes me as quite exotic. > Would it look as exotic if presented like this? File "", line 1 𐌀 = 5 ^ SyntaxError: invalid character in identifier (works on a wide b

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: The example in my previous message should have been: >>> '\U0001' == '\uD800\uDC00' True -- ___ Python tracker ___ _

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Sat, Nov 27, 2010 at 5:24 PM, Marc-Andre Lemburg wrote: .. > Perhaps we should allow ord() to work on surrogates in > UCS4 builds as well. That would reduce the number of > surprises. > This is an interesting idea, however, having surrogates in UCS4 b

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Wed, Dec 29, 2010 at 7:19 AM, Marc-Andre Lemburg wrote: .. > * The macros still need some more attention to enhance their performance. > Although I made your suggested change from '-' to '&', I seriously doubt that this would make any difference on mod

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Georg Brandl
Georg Brandl added the comment: That bug already strikes me as quite exotic. You need to at least address Marc-Andre's remarks, and to give an overview of what else you'd like to change as well, and how this could affect semantics. Remember that the next release is already a release candidate

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Wed, Dec 29, 2010 at 10:00 AM, Georg Brandl wrote: .. > >> Let's wait for 3.3 with the change. > > Definitely. Does this also mean that the numerous surrogates related bugs should wait until 3.3 as well? (See issues #9200 and #10521.) This patch was

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Georg Brandl
Georg Brandl added the comment: > Let's wait for 3.3 with the change. Definitely. -- nosy: +georg.brandl versions: +Python 3.3 -Python 3.2 ___ Python tracker ___ __

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-29 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > I am attaching a patch for commit review. I added an underscore prefix to > all new macros. This way I am not introducing new features and we will have > a full release cycle

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-28 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: I am attaching a patch for commit review. I added an underscore prefix to all new macros. This way I am not introducing new features and we will have a full release cycle to come up with better names. i would just note that "next" terminology is cons

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-28 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Sat, Nov 27, 2010 at 5:03 PM, Marc-Andre Lemburg wrote: .. >  * this version should be slightly faster and is also easier to read: > > #define Py_UCS4_READ_CODE_POINT(ptr, end) \ .. >      Py_UNICODE_JOIN_SURROGATES((ptr)++, (ptr)++) : \ .. >   I haven

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-16 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Fri, Dec 10, 2010 at 6:09 PM, Daniel Stutzbach wrote: .. > The second check for surrogates in Py_UNICODE_PUT_NEXT is necessary, unless > you can prove that > Py_UNICODE_SOME_TRANSFORMATION will never transform characters < 0x1 into > characters >

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-16 Thread Alexander Belopolsky
Changes by Alexander Belopolsky : -- nosy: +doerwalter ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://m

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-10 Thread Daniel Stutzbach
Daniel Stutzbach added the comment: In bltinmodule.c, it looks like some of the indentation doesn't line up? Bikeshedding aside, it looks good to me. I agree with Eric Smith that the first part macro name usually refers to the type of the first argument (or the type the first argument points

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-07 Thread Daniel Stutzbach
Daniel Stutzbach added the comment: +1 on the general idea of abstracting out repeated code. I will take a closer look at the details within the next few days. -- ___ Python tracker __

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-07 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: Daniel, While these macros should not affect ABI, I would appreciate your feedback in light of your work on issue 8654. -- nosy: +stutzbach ___ Python tracker

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-03 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > On Sat, Nov 27, 2010 at 6:38 PM, Raymond Hettinger > wrote: > .. >> I suggest Py_UNICODE_ADVANCE() to avoid false suggestion that the iterator >> protocol is being used. >> > >

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-12-03 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Sat, Nov 27, 2010 at 6:38 PM, Raymond Hettinger wrote: .. > I suggest Py_UNICODE_ADVANCE() to avoid false suggestion that the iterator > protocol is being used. > As a data point, ICU defines U16_NEXT() for similar purpose. I also like ICU terminolo

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: I am attaching a patch that defines Py_UNICODE_PUT_NEXT() macro (tentative name) and uses it to fix str.upper method. The implementation of surrogate-aware str.upper shows that NEXT/PUT_NEXT abstractions may lead to somewhat inefficient code for "by co

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Ezio Melotti
Ezio Melotti added the comment: AFAIU the macro returns lone surrogates as they are, this means that: 1) if the string contains only surrogate pairs, Py_UNICODE_NEXT will iterate on scalar values[0]; 2) if the string contains only lone surrogates, it will iterate on codepoints[1]; 3) if

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I suggest Py_UNICODE_ADVANCE() to avoid false suggestion that the > iterator protocol is being used. You can't use the iterator protocol on a non-PyObject, and Py_UNICODE_* (as opposed to PyUnicode_*) suggests the macro operates on a raw array of code points.

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Raymond Hettinger
Raymond Hettinger added the comment: I suggest Py_UNICODE_ADVANCE() to avoid false suggestion that the iterator protocol is being used. -- ___ Python tracker ___ __

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Sat, Nov 27, 2010 at 5:41 PM, Ezio Melotti wrote: > > Ezio Melotti added the comment: > >> * the Py_UNICODE_JOIN_SURROGATES() macro should use Py_UCS4 as prefix since >> it returns Py_UCS4 values, i.e. Py_UCS4_JOIN_SURROGATES() >> * same for the Py_U

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Ezio Melotti
Ezio Melotti added the comment: > * the Py_UNICODE_JOIN_SURROGATES() macro should use Py_UCS4 as prefix since > it returns Py_UCS4 values, i.e. Py_UCS4_JOIN_SURROGATES() > * same for the Py_UNICODE_NEXT() macro, i.e. Py_UCS4_NEXT() I'm not so familiar with the prefix conventions, but wouldn't

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Eric Smith
Eric Smith added the comment: > The idea is that the first part refers to what the macro > returns (Py_UCS4) and the "read" part of the name refers > to moving a pointer across an array (any array of integers). I thought the first part generally meant the type of the first parameter. Although

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > On Sat, Nov 27, 2010 at 5:03 PM, Marc-Andre Lemburg > wrote: > .. >> * same for the Py_UNICODE_NEXT() macro, i.e. Py_UCS4_NEXT() >> >> * in order to make the macro easier to un

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Sat, Nov 27, 2010 at 5:03 PM, Marc-Andre Lemburg wrote: .. >  * same for the Py_UNICODE_NEXT() macro, i.e. Py_UCS4_NEXT() > >  * in order to make the macro easier to understand, please rename it to > Py_UCS4_READ_CODE_POINT(); that's a little more typ

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Sat, Nov 27, 2010 at 5:03 PM, Marc-Andre Lemburg wrote: .. [I'll respond to skipped when I update the patch] > In any case, we should clearly document where these macros are used and > warn about the implications of using them in the wrong places. It

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: I like the idea and thanks for putting work into this. Some comments: * when using macro variables, always put the variables in parens in the expansion; this avoids precedence issues, weird syntax errors, etc. - even if it may not be necessary * a f

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-27 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Raymond Hettinger wrote: > > Raymond Hettinger added the comment: > > Mark, can you opine on this? Yes, I'll have a look later today. -- ___ Python tracker ___

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Raymond Hettinger
Raymond Hettinger added the comment: Mark, can you opine on this? -- assignee: belopolsky -> lemburg ___ Python tracker ___ ___ Pytho

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: Raymond, I wonder if you would like to comment on the iterator analogy and/or on adding public names to C API. -- nosy: +rhettinger ___ Python tracker

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Fri, Nov 26, 2010 at 9:22 PM, Eric Smith wrote: .. > But I definitely agree that we should get the abstraction right first and > worry about > the implementation later. I am fairly happy with Py_UNICODE_NEXT() abstraction. It's semantics should be n

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Eric Smith
Eric Smith added the comment: The compiler's decision to inline something should not be related to its ability to put variables in a register. But I definitely agree that we should get the abstraction right first and worry about the implementation later. -- _

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Fri, Nov 26, 2010 at 8:41 PM, STINNER Victor wrote: .. > I don't like macro having a result and using multiple instructions using the > evil > magic trick (the ","). It's harder to maintain the code and harder to debug > than > a classical function.

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Eric Smith
Eric Smith added the comment: The code will basically be: Py_UCS4 fill; parse_format_string(fmt, ..., &fill, ...); /* lots more code */ if (fill_needed) { /* compute how many characters to reserve */ space_needed = Py_UNICODE_NUM_NEEDED(fill) * number_of

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread STINNER Victor
STINNER Victor added the comment: I don't like macro having a result and using multiple instructions using the evil magic trick (the ","). It's harder to maintain the code and harder to debug than a classical function. Don't you think that modern compilers are able to inline the code? (If not

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Fri, Nov 26, 2010 at 7:45 PM, Eric Smith wrote: .. > For my use I'd really need it to take the result of Py_UNICODE_NEXT. > Something like: > Py_ssize_t > Py_UNICODE_NUM_NEEDED(Py_UCS4 c) > and it would always return 1 or 2. Always 1 for a wide build,

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Eric Smith
Eric Smith added the comment: I'd need access to this without having to build a PyUnicodeObject, for efficiency. But it sounds like it does have the basic functionality I need. For my use I'd really need it to take the result of Py_UNICODE_NEXT. Something like: Py_ssize_t Py_UNICODE_NUM_NEEDE

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Fri, Nov 26, 2010 at 7:27 PM, Eric Smith wrote: .. > > In addition to the proposed Py_UNICODE_NEXT and Py_UNICODE_PUT_NEXT, > > str.__format__ would also need a function that tells it how many Py_UNICODEs > are needed to store a given Py_UCS4. Yes, t

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Eric Smith
Eric Smith added the comment: In addition to the proposed Py_UNICODE_NEXT and Py_UNICODE_PUT_NEXT, str.__format__ would also need a function that tells it how many Py_UNICODEs are needed to store a given Py_UCS4. -- ___ Python tracker

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
Changes by Alexander Belopolsky : -- nosy: +haypo, loewis ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http:

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

2010-11-26 Thread Alexander Belopolsky
New submission from Alexander Belopolsky : As discussed in issue 10521 and the sprawling "len(chr(i)) = 2?" thread [1] on python-dev, many functions in python library behave differently on narrow and wide builds. While there are unavoidable differences such as the length of strings with non-B