[issue17615] String comparison performance regression

2013-04-09 Thread Neil Hodgson
Neil Hodgson added the comment: Windows is the only widely used OS that has a 16-bit wchar_t. I can't recall what OS/2 did but Python doesn't support OS/2 any more. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17615

[issue17615] String comparison performance regression

2013-04-08 Thread Neil Hodgson
Neil Hodgson added the comment: A quick rewrite showed the single level case slightly faster (1%) on average but its less readable/maintainable. Perhaps taking a systematic approach to naming would allow Py_UCS1 to be deduced from PyUnicode_1BYTE_KIND and so avoid repeating the information

[issue17615] String comparison performance regression

2013-04-08 Thread Neil Hodgson
Neil Hodgson added the comment: Including the wmemcmp patch did not improve the times on MSC v.1600 32 bit - if anything, the performance was a little slower for the test I used: a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']156 specialised: [0.9125948707773204, 0.8990815272107868

[issue17615] String comparison performance regression

2013-04-07 Thread Neil Hodgson
Neil Hodgson added the comment: The patch fixes the performance regression on Windows. The 1:1 case is better than either 3.2.4 or 3.3.1 downloads from python.org. Other cases are close to 3.2.4, losing at most around 2%. Measurements from 32-bit builds: ## Download 3.2.4 3.2.4 (default, Apr

[issue17615] String comparison performance regression

2013-04-04 Thread Neil Hodgson
Neil Hodgson added the comment: Looking at the assembler output from gcc 4.7 on Linux shows that it specialises the loop 9 times - once for each pair of kinds. This is why there was far less slow-down on Linux. Explicitly writing out the 9 loops is inelegant and would make accurate

[issue17615] String comparison performance regression

2013-04-03 Thread Neil Hodgson
Neil Hodgson added the comment: For 32-bits whether wchar_t is signed shouldn't matter as Unicode is only 21-bits so no character will be seen as negative. On Windows, wchar_t is unsigned. C11 has char16_t and char32_t which are both unsigned but it doesn't include comparison functions

[issue17615] String comparison performance regression

2013-04-03 Thread Neil Hodgson
Neil Hodgson added the comment: For 32-bit Windows, the code generated for unicode_compare is quite slow. There are either 1 or 2 kind checks in each call to PyUnicode_READ and 2 calls to PyUnicode_READ inside the loop. A compiler may decide to move the kind checks out of the loop

[issue17615] String comparison performance regression

2013-04-02 Thread Neil Hodgson
New submission from Neil Hodgson: On Windows, non-equal comparisons (, =, , =) between strings with common prefixes are slower in Python 3.3 than 3.2. This is for both 32-bit and 64-bit builds. Performance on Linux has not decreased for the same code. The attached program tests comparisons

[issue17615] String comparison performance regression

2013-04-02 Thread Neil Hodgson
Neil Hodgson added the comment: The common cases are likely to be 1:1, 2:2, and 1:2. There is already a specialisation for 1:1. wmemcmp is widely available but is based on wchar_t so is for different widths on Windows and Unix. On Windows it would handle the 2:2 case

[issue6664] readlines should understand Line Separator and Paragraph Separator characters

2009-08-07 Thread Neil Hodgson
New submission from Neil Hodgson nyamaton...@users.sourceforge.net: Unicode includes Line Separator U+2028 and Paragraph Separator U+2029 line ending characters. The readlines method of the file object returned by the built-in open does not treat these characters as line ends although the object

[issue3617] Add MS EULA to the list of third-party licenses in the Windows installer

2008-09-13 Thread Neil Hodgson
Neil Hodgson [EMAIL PROTECTED] added the comment: The recommended addition includes the 'excluded license' section which appears unnecessary as Python does not distribute any source code redistributables, only the .DLL file which is a binary executable. Including this is likely to confuse