Antoine Pitrou [EMAIL PROTECTED] added the comment:
I don't understand the whole decoding machinery in the tokenizer, but
the patch looks ok to me. (tested in debug mode under Linux and Windows)
--
nosy: +pitrou
___
Python tracker [EMAIL PROTECTED]
Benjamin Peterson [EMAIL PROTECTED] added the comment:
The patch also looks pretty harmless to me. :)
--
nosy: +benjamin.peterson
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
___
Hye-Shik Chang [EMAIL PROTECTED] added the comment:
pitrou, that's because Python source code can't be correctly tokenized
when it's encoded in few odd encodings like iso-2022 or shift-jis which
utilizes \, (, ) and as second byte of two-byte character sequence.
For example, '\x81\\' is
Brett Cannon [EMAIL PROTECTED] added the comment:
Committed in r66209.
--
resolution: - accepted
status: open - closed
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
___
Changes by Brett Cannon [EMAIL PROTECTED]:
--
keywords: +needs review
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
___
___
Python-bugs-list
New submission from Brett Cannon [EMAIL PROTECTED]:
Turns out that PyTokenizer_FindEncoding() never properly succeeds
because the tok_state used by it does not have tok-filename set, which
is an error condition in the tokenizer. This error has been masked by
the one place the function is used,
Brett Cannon [EMAIL PROTECTED] added the comment:
I have not bothered to check if this exists in 2.6, but I don't see why
it would be any different.
--
type: - behavior
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
Brett Cannon [EMAIL PROTECTED] added the comment:
Turns out that the NULL return value can signal an error that manifests
itself as SyntaxError(encoding problem: with BOM) thanks to the lack
of tok-filename being set in Parser/tokenizer.c:fp_setreadl() which is
called by check_coding_spec() and
Brett Cannon [EMAIL PROTECTED] added the comment:
Attached is a patch that fixes where the error occurs. By opening the
file by either file name or file descriptor, the problem goes away. Once
this patch is accepted then PyErr_Occurred() should be added to all uses
of PyTokenizer_FindEncoding().