[issue10139] regex A|B : both A and B match, but B is wrongly preferred

2010-10-18 Thread Christos Georgi ou
New submission from Χρήστος Γεωργίου (Christos Georgiou) : This is based on that StackOverflow answer: http://stackoverflow.com/questions/3957164/3963443#3963443. It also applies to Python 2.6 . Searching for a regular expression that satisfies the mentioned SO question (a regular expression

[issue10139] regex A|B : both A and B match, but B is wrongly preferred

2010-10-18 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: For completeness' sake, I also provide the "(?:regex_n)" results: >>> text= 'A***Z' >>> re.compile('(?:(?<=^A).*(?=Z$))').search(text).group(0) # regex_1 '***' >>> re.compile('(?:(?<=^A).*)').search(text).group(0) # regex_2 '***Z' >>> re.

[issue10139] regex A|B : both A and B match, but B is wrongly preferred

2010-10-19 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: As I see it, it's more like: >>> re.search('a.*c|a.*|.*c', 'abc').group() producing 'bc' instead of 'abc'. Substitute "(?<=^A)" for "a" and "(?=Z$)" for "c" in the pattern above. In your example, the first part ('bc') does not match th

[issue10139] regex A|B : both A and B match, but B is wrongly preferred

2010-10-19 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: Georg, please re-open it. Focus on the difference between example regex_1|regex_2 (both matching; regex_1 is used as it should be), and regex_1|regex_3 (both matching; regex_3 is used incorrectly). -- __

[issue10139] regex A|B : both A and B match, but B is wrongly preferred

2010-10-19 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: No, my mistake, you did well for closing it. The more explicit version of the explanation: both regex_1 and regex_2 start actually matching at index 1, while regex_3 starts matching at index 0. -- __

[issue10160] operator.attrgetter slower than lambda after adding dotted names ability

2010-10-20 Thread Christos Georgi ou
New submission from Χρήστος Γεωργίου (Christos Georgiou) : (Discovered in that StackOverflow answer: http://stackoverflow.com/questions/3940518/3942509#3942509 ; check the comments too) operator.attrgetter in its simplest form (i.e. with a single non-dotted name) needs more time to execute t

[issue10160] operator.attrgetter slower than lambda after adding dotted names ability

2010-10-20 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: Here comes the diff to Modules/operator.c, Doc/library/operator.rst and Lib/test/test_operator.py . As far as I could check, there are no leaks, but a more experienced eye in core development could not hurt. Also, obviously test_operato

[issue10160] operator.attrgetter slower than lambda after adding dotted names ability

2010-10-20 Thread Christos Georgi ou
Changes by Χρήστος Γεωργίου (Christos Georgiou) : Removed file: http://bugs.python.org/file19312/issue10160.diff ___ Python tracker ___ ___ Py

[issue10160] operator.attrgetter slower than lambda after adding dotted names ability

2010-10-20 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: Newer version of the diff, since I forgot some "if(0) fprintf" debug calls that shouldn't be there. -- Added file: http://bugs.python.org/file19313/issue10160.diff ___ Python tracker

[issue10160] operator.attrgetter slower than lambda after adding dotted names ability

2010-10-20 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: An explanation to the changes. The old code kept the operator.itemgetter arguments in the ag->attr member. If the argument count (ag->nattrs) was 1, the single argument was kept; if more than 1, a tuple of the original arguments was kep

[issue10160] operator.attrgetter slower than lambda after adding dotted names ability

2010-10-20 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: Modules/operator.c grows by ~70 lines, most of it the setup code for ag->attr; also I loop twice over the args of attrgetter_new, choosing fast code that runs once per attrgetter creation than temporary data. Alex's suggestion to make u

[issue10160] operator.attrgetter slower than lambda after adding dotted names ability

2010-10-22 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: A newer version of the patch with the following changes: - single loop in the ag->attr setup phase of attrgetter_new; interning of the stored attribute names - added two more tests of invalid attrgetter parameters (".attr", "attr.") ---

[issue10160] operator.attrgetter slower than lambda after adding dotted names ability

2010-10-30 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: Thank you very much, Antoine, for your review. My comments in reply: - the dead code: it's not dead, IIRC it ensures that at least one argument is given, otherwise it raises an exception. - PyUnicode_GET_SIZE: you're right. The previous

[issue1602] windows console doesn't print utf8 (Py30a2)

2010-11-04 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: http://blogs.msdn.com/b/michkap/archive/2008/03/18/8306597.aspx If you want any kind of Unicode output in the console, the font must be an “official” MS console TTF (“official” as defined by the Windows version); I believe only Lucida C

[issue1602] windows console doesn't print utf8 (Py30a2)

2009-09-18 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: Another note: if one creates a dummy Stream object (having a softspace attribute and a write method that writes using os.write, as in http://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/1432462#1432462

[issue6058] Add cp65001 to encodings/aliases.py

2009-12-22 Thread Christos Georgi ou
Χρήστος Γεωργίου (Christos Georgiou) added the comment: re Martin's question, I can offer the indirect wisdom of Michael Kaplan in this blog post: http://blogs.msdn.com/michkap/archive/2008/03/18/8306597.aspx where he mentions that the easiest way to output unicode text in the Windows console