[issue37996] 2to3 introduces unwanted extra backslashes for unicode characters in regular expressions

2019-08-31 Thread Bob Kline
Bob Kline added the comment: In fact, I suppose it's possible that the warning as I worded it is still not restrictive enough, and that there are subtle dependencies between the fixers which would make the action of one of them render the code no longer safely fixable as Python 2 code by

[issue37996] 2to3 introduces unwanted extra backslashes for unicode characters in regular expressions

2019-08-31 Thread Bob Kline
Bob Kline added the comment: Thanks, I understand. However, this highlights something which had slipped under my radar. You get one shot at running a code set through the tool. You can't do what I was doing, which was to run the tool in "don't write" mode, then fix by hand some of the

[issue37996] 2to3 introduces unwanted extra backslashes for unicode characters in regular expressions

2019-08-31 Thread Ned Deily
Change by Ned Deily : -- resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker ___ ___

[issue37996] 2to3 introduces unwanted extra backslashes for unicode characters in regular expressions

2019-08-31 Thread Matthew Barnett
Matthew Barnett added the comment: You wrote "the u had already been removed by hand". By removing the u in the _Python 2_ code, you changed that string from a Unicode string to a bytestring. In a bytestring, \u is not an escape; b"\u" == b"\\u". -- nosy: +mrabarnett

[issue37996] 2to3 introduces unwanted extra backslashes for unicode characters in regular expressions

2019-08-31 Thread Bob Kline
Bob Kline added the comment: Ah, this is worse than I first thought. It's not just converting code by adding extra backslashes to regular expression strings, where at least the regular expression engine will do what the original code was asking the Python parser to do (unless user code

[issue37996] 2to3 introduces unwanted extra backslashes for unicode characters in regular expressions

2019-08-31 Thread Bob Kline
Bob Kline added the comment: The original string had u"""...""" and the u had already been removed by hand in preparation for moving to Python 3. -- ___ Python tracker ___

[issue37996] 2to3 introduces unwanted extra backslashes for unicode characters in regular expressions

2019-08-31 Thread Bob Kline
New submission from Bob Kline : -UNWANTED = re.compile("""['".,?!:;()[\]{}<>\u201C\u201D\u00A1\u00BF]+""") +UNWANTED = re.compile("""['".,?!:;()[\]{}<>\\u201C\\u201D\\u00A1\\u00BF]+""") The non-ASCII characters in the original string are perfectly legitimate str characters, using