[issue26152] A non-breaking space in a source

2016-07-24 Thread Nick Coghlan
Nick Coghlan added the comment: http://bugs.python.org/issue27582 is a later mention of the same problem that attracted patches before Adam noticed it was a repeat of this issue. Marking this as the duplicate, since the problem applies to more than just Unicode whitespace, and the problems

[issue26152] A non-breaking space in a source

2016-07-24 Thread Nick Coghlan
Changes by Nick Coghlan : -- resolution: -> duplicate status: open -> closed ___ Python tracker ___

[issue26152] A non-breaking space in a source

2016-01-20 Thread Martin Panter
Martin Panter added the comment: The caret always points to the end of the token, I think. -- nosy: +martin.panter ___ Python tracker ___

[issue26152] A non-breaking space in a source

2016-01-20 Thread Adam Bartoš
Adam Bartoš added the comment: We have one particular invalid token, so why it should point to the next token rather than to the invalid one? -- ___ Python tracker

[issue26152] A non-breaking space in a source

2016-01-20 Thread Martin Panter
Martin Panter added the comment: Assuming Andrew is correct, it sounds like the tokenizer is treating the NBSP and the “2” as part of the same token, because NBSP is non-ASCII. -- ___ Python tracker

[issue26152] A non-breaking space in a source

2016-01-20 Thread Adam Bartoš
Adam Bartoš added the comment: It could still point to the first or the last byte of the invalid token rather than to the start of the next token. Also, by the Python implementation of the tokenizer in tokenize module we get an ERRORTOKEN containing a non-breaking space followed by a number

[issue26152] A non-breaking space in a source

2016-01-20 Thread Andrew Barnert
Andrew Barnert added the comment: > Assuming Andrew is correct, it sounds like the tokenizer is treating the NBSP > and the “2” as part of the same token, because NBSP is non-ASCII. It's more complicated than that. When you get an invalid character, it splits the token up. So, in this case,

[issue26152] A non-breaking space in a source

2016-01-20 Thread Adam Bartoš
Adam Bartoš added the comment: That explains the message. But why is the caret at a wrong place? -- ___ Python tracker ___

[issue26152] A non-breaking space in a source

2016-01-19 Thread Adam Bartoš
New submission from Adam Bartoš: Consider the following code: >>> 1, 2 File "", line 1 1, 2 ^ SyntaxError: invalid character in identifier The error is due to the fact, that the space before "2" is actually a non-breaking space. The error message and the position of the caret is

[issue26152] A non-breaking space in a source

2016-01-19 Thread Andrew Barnert
Andrew Barnert added the comment: Ultimately, this is because the tokenizer works byte by byte instead of character by character, as far as possible. Since any byte >= 128 must be part of some non-ASCII character, and the only legal use for non-ASCII characters outside of quotes and comments