[issue12486] tokenize module should have a unicode API

2018-06-05 Thread Thomas Kluyver
Thomas Kluyver added the comment: Thanks Carol :-) -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue12486] tokenize module should have a unicode API

2018-06-05 Thread Carol Willing
Change by Carol Willing : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___

[issue12486] tokenize module should have a unicode API

2018-06-05 Thread Carol Willing
Carol Willing added the comment: New changeset c56b17bd8c7a3fd03859822246633d2c9586f8bd by Carol Willing (Thomas Kluyver) in branch 'master': bpo-12486: Document tokenize.generate_tokens() as public API (#6957) https://github.com/python/cpython/commit/c56b17bd8c7a3fd03859822246633d2c9586f8bd

[issue12486] tokenize module should have a unicode API

2018-05-28 Thread Thomas Kluyver
Thomas Kluyver added the comment: The tests on PR #6957 are passing now, if anyone has time to have a look. :-) -- ___ Python tracker ___

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Thomas Kluyver
Thomas Kluyver added the comment: Thanks - I had forgotten it, just fixed it now. -- ___ Python tracker ___

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Martin Panter
Martin Panter added the comment: Don’t forget about updating __all__. -- ___ Python tracker ___

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Thomas Kluyver
Thomas Kluyver added the comment: I agree, it's not a good design, but it's what's already there; I just want to ensure that it won't be removed without a deprecation cycle. My PR makes no changes to behaviour, only to documentation and tests. This and issue 9969 have

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: My concern is that we will have two functions with non-similar names (tokenize() and generate_tokens()) that does virtually the same, but accept different types of input (bytes or str), and the single function untokenize() that

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Thomas Kluyver
Thomas Kluyver added the comment: I wouldn't say it's a good name, but I think the advantage of documenting an existing name outweighs that. We can start (or continue) using generate_tokens() right away, whereas a new name presumably wouldn't be available until Python

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- nosy: +barry, mark.dickinson, michael.foord, trent versions: +Python 3.8 -Python 3.6 ___ Python tracker

[issue12486] tokenize module should have a unicode API

2018-05-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: The old generate_tokens() was renamed to tokenize() in issue719888 because the latter is a better name. Is "generate_tokens" considered a good name now? -- ___ Python tracker

[issue12486] tokenize module should have a unicode API

2018-05-17 Thread Thomas Kluyver
Change by Thomas Kluyver : -- pull_requests: +6616 ___ Python tracker ___ ___

[issue12486] tokenize module should have a unicode API

2018-05-17 Thread Matthias Bussonnier
Matthias Bussonnier added the comment: > Why not just bless the existing generate_tokens() function as a public API, Yes please, or just make the private `_tokenize` public under another name. The `tokenize.tokenize` method try to magically detect encoding which

[issue12486] tokenize module should have a unicode API

2018-03-11 Thread Thomas Kluyver
Thomas Kluyver added the comment: > Why not just bless the existing generate_tokens() function as a public API We're actually using generate_tokens() from IPython - we wanted a way to tokenize unicode strings, and although it's undocumented, it's been there for a number

[issue12486] tokenize module should have a unicode API

2015-10-05 Thread Martin Panter
Martin Panter added the comment: I didn’t notice that this dual untokenize() behaviour already existed. Taking that into account weakens my argument for having separate text and bytes tokenize() functions. -- ___ Python tracker

[issue12486] tokenize module should have a unicode API

2015-10-05 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Thank you for your review Martin. Here is rebased patch that addresses Matin's comments. I agree that having untokenize() changes its output type depending on the ENCODING token is bad design and we should change this. But this is perhaps other issue.

[issue12486] tokenize module should have a unicode API

2015-10-05 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : Added file: http://bugs.python.org/file40679/tokenize_str_2.diff ___ Python tracker ___

[issue12486] tokenize module should have a unicode API

2015-10-04 Thread Martin Panter
Martin Panter added the comment: I agree it would be very useful to be able to tokenize arbitrary text without worrying about encoding tokens. I left some suggestions for the documentation changes. Also some test cases for it would be good. However I wonder if a separate function would be

[issue12486] tokenize module should have a unicode API

2012-12-28 Thread Meador Inge
Meador Inge added the comment: See also issue9969. -- nosy: +meador.inge ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12486 ___ ___

[issue12486] tokenize module should have a unicode API

2012-10-15 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Patch to allow tokenize() accepts string is very simple, only 4 lines. But it requires a lot of documentation changes. Then we can get rid of undocumented generate_tokens(). Note, stdlib an tools use only generate_tokens(), none uses tokenize(). Of course,

[issue12486] tokenize module should have a unicode API

2012-10-13 Thread Eric Snow
Changes by Eric Snow ericsnowcurren...@gmail.com: ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12486 ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue12486] tokenize module should have a unicode API

2011-07-09 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: The compiler has a PyCF_SOURCE_IS_UTF8 flag: see compile() builtin. The parser has a flag to ignore the coding cookie: PyPARSE_IGNORE_COOKIE. Patch tokenize to support Unicode is simple: use PyCF_SOURCE_IS_UTF8 and/or

[issue12486] tokenize module should have a unicode API

2011-07-09 Thread Eric Snow
Changes by Eric Snow ericsnowcurren...@gmail.com: -- nosy: +ericsnow ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12486 ___ ___ Python-bugs-list

[issue12486] tokenize module should have a unicode API

2011-07-08 Thread Petri Lehtinen
Changes by Petri Lehtinen pe...@digip.org: -- nosy: +petri.lehtinen ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12486 ___ ___ Python-bugs-list

[issue12486] tokenize module should have a unicode API

2011-07-08 Thread Terry J. Reedy
Terry J. Reedy tjre...@udel.edu added the comment: Hmm. Python 3 code is unicode. Python reads program text as Unicode code points. The tokenize module purports to provide a lexical scanner for Python source code. But it seems not to do that. Instead it provides a scanner for Python code

[issue12486] tokenize module should have a unicode API

2011-07-04 Thread Éric Araujo
Changes by Éric Araujo mer...@netwok.org: -- nosy: +eric.araujo, haypo type: - feature request versions: +Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12486 ___

[issue12486] tokenize module should have a unicode API

2011-07-03 Thread Devin Jeanpierre
New submission from Devin Jeanpierre jeanpierr...@gmail.com: tokenize only deals with bytes. Users might want to deal with unicode source (for example, if python source is embedded into a document with an already-known encoding). The naive approach might be something like: def