[issue30717] Add unicode grapheme cluster break algorithm

2017-08-07 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: > I don't think unicodedata is the right place I do agree with that. A new module sounds good, would it be a problem if that module would contain very few functions at first? > Can we mark this as having a Provisional API to give us time to

[issue30717] Add unicode grapheme cluster break algorithm

2017-08-03 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: I have a few criticism to do against that proto-PEP http://mail.python.org/pipermail/python-dev/2001-July/015938.html In particular, the fact that all those functions return an index prevents any state keeping. That's a problem because: > next_(u, in

[issue30717] Add unicode grapheme cluster break algorithm

2017-08-03 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: Thanks for your consideration. I'm currently fixing what's been asked in the reviews. > But it would be useful to provide also word and sentence iterators. I'll gladly do that as well! > I think emitting a pair (pos, substring) would be more

[issue30717] str.center() is not unicode aware

2017-08-02 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: Hi, Are you guys still interested? I haven't heard from you in a while -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/i

[issue30717] str.center() is not unicode aware

2017-07-13 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: Hello Steven! Thanks for your reactivity! unicodedata.grapheme_cluster_break() takes a unicode code point as an argument and return its GraphemeBreakProperty as a string. Possible values are listed here: http://www.unicode.org/reports/tr29/#CR help

[issue12568] Add functions to get the width in columns of a character

2017-07-13 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: Hello, I come from bugs.python.org/issue30717 . I have a pending PR that needs review ( https://github.com/python/cpython/pull/2673 ) adding a function that breaks unicode strings into grapheme clusters (aka what one would intuitively call "a char

[issue30717] str.center() is not unicode aware

2017-07-13 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: Hello, I implemented unicodedata.break_graphemes() that returns an iterators that spits consecutive graphemes. This is a "test" implementation meant to see what doesn't fits Python's style and design, to discuss naming and implementation detai

[issue30717] str.center() is not unicode aware

2017-07-11 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: Hello to all of you, sorry for the delay. Been busy. I added the base code needed to built the grapheme cluster break algorithm. We now have the GraphemeBreakProperty available via unicodedata.grapheme_cluster_break() Can you check that the implementation

[issue30717] str.center() is not unicode aware

2017-06-20 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: Thanks for all those interesting cases you brought here! I didn't think of that at all! I'm using the word "grapheme" as per the definition given in UAX TR29 which is *not* language/locale dependant [1]. This annex is very specific and precise a

[issue30717] str.center() is not unicode aware

2017-06-20 Thread Guillaume Sanchez
Guillaume Sanchez added the comment: Obviously, I'm talking about str.center() but all functions needing a count of graphemes are then not totally correct. I can fix that and add the corresponding function, or an iterator over graphemes, or whatever seems right

[issue30717] str.center() is not unicode aware

2017-06-20 Thread Guillaume Sanchez
New submission from Guillaume Sanchez: "a⃑".center(width=5, fillchar=".") produces '..a⃑.' instead of '..a⃑..' The reason is that "a⃑" is composed of two code points (2 UCS4 chars), one 'a' and one combining code point "above arrow". str.center()