Pod added the comment: Not the OP, but I find this message a bug because it's confusing from the perspective of a user of the tokenize() function. If you give tokenize a readlines() that returns a str, you get this error message that confusingly states that something inside tokenize must be a string and NOT a bytes, even though the user gave readlines a string, not a bytes. It looks like an internal bug.
Turns out it's because the contact changed from python2 to 3. Personally, I'd been accidentally reading the python2 page for the tokenize library instead of python3, and had been using tokenize.generate_tokens in my python 3 code which accepts a io.StringIO just fine. When I realising my mistake and switched to the python3 version of the page I noticed generate_tokens is no longer supported, even though the code I had was working, and I noticed that the definition of tokenize had changed to match the old generate_tokens (along with a subtle change in the definition of the acceptable readlines function). So when I switched from tokenize.generate_tokens to tokenize.tokenize to try and use the library as intended, I get the same error as OP. Perhaps OP made a similar mistake? To actually hit the error in question: $ cat -n temp.py 1 import tokenize 2 import io 3 4 5 byte_reader = io.BytesIO(b"test bytes generate_tokens") 6 tokens = tokenize.generate_tokens(byte_reader.readline) 7 8 byte_reader = io.BytesIO(b"test bytes tokenize") 9 tokens = tokenize.tokenize(byte_reader.readline) 10 11 byte_reader = io.StringIO("test string generate") 12 tokens = tokenize.generate_tokens(byte_reader.readline) 13 14 str_reader = io.StringIO("test string tokenize") 15 tokens = tokenize.tokenize(str_reader.readline) 16 17 $ python3 temp.py Traceback (most recent call last): File "temp.py", line 15, in <module> tokens = tokenize.tokenize(str_reader.readline) File "C:\work\env\python\Python34_64\Lib\tokenize.py", line 467, in tokenize encoding, consumed = detect_encoding(readline) File "C:\work\env\python\Python34_64\Lib\tokenize.py", line 409, in detect_encoding if first.startswith(BOM_UTF8): TypeError: startswith first arg must be str or a tuple of str, not bytes ---------- nosy: +Pod _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue23297> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com