[issue9712] tokenize yield an ERRORTOKEN if the identifier starts with a non-ascii char

2018-03-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: Actually, #1693050 and #12731, about \w, are duplicates. -- ___ Python tracker ___ ___ Python-bugs-

[issue9712] tokenize yield an ERRORTOKEN if the identifier starts with a non-ascii char

2018-03-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: Joshua opened #24194 as a duplicate of this because he could not reopen this. I am leaving it open as the superseder for this as Serhiy has already added two dependencies there, and because this seems to be a duplicate in turn of #1693050 (which I will close

[issue9712] tokenize yield an ERRORTOKEN if the identifier starts with a non-ascii char

2015-04-11 Thread Joshua Landau
Joshua Landau added the comment: This doesn't seem to be a complete fix; the regex used does not include Other_ID_Start or Other_ID_Continue from https://docs.python.org/3.5/reference/lexical_analysis.html#identifiers Hence tokenize does not accept '℘·'. Credit to modchan from http://stackove

[issue9712] tokenize yield an ERRORTOKEN if the identifier starts with a non-ascii char

2010-08-30 Thread Benjamin Peterson
Benjamin Peterson added the comment: r84364 -- nosy: +benjamin.peterson resolution: -> fixed status: open -> closed ___ Python tracker ___ __

[issue9712] tokenize yield an ERRORTOKEN if the identifier starts with a non-ascii char

2010-08-30 Thread Florent Xicluna
New submission from Florent Xicluna : from io import BytesIO from tokenize import tokenize, tok_name sample = 'éléphants = "un éléphant, deux éléphants, ..."\nprint(éléphants)\n' sampleb = sample.encode('utf-8') exec(sample) # output: un éléphant, deux éléphants, ... exec(sampleb) # output: un