[issue17668] re.split loses characters matching ungrouped parts of a pattern

2014-09-14 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- resolution: -> not a bug stage: needs patch -> resolved status: pending -> closed ___ Python tracker ___ ___

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-10-27 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- status: open -> pending ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https:

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-14 Thread Mike Hoy
Changes by Mike Hoy : -- nosy: +mikehoy ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-10 Thread Tomasz J. Kotarba
Tomasz J. Kotarba added the comment: The example I gave was the simplest possible to illustrate my point but yes, you are correct, I often match the whole string as I do recursive matches. I do use non-capturing groups but they would not solve the problem I talked about. Anyway, I had solved

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread R. David Murray
R. David Murray added the comment: Only group the stuff you want to see in the result: >>> re.split(r'(^>.*$)', '>Homo sapiens catenin (cadherin-associated)') ['', '>Homo sapiens catenin (cadherin-associated)', ''] >>> re.split(r'^(>.*)$', '>Homo sapiens catenin (cadherin-associated)') ['', '>H

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Tomasz J. Kotarba
Tomasz J. Kotarba added the comment: Hi, I can still see one piece of functionality I have mentioned missing. Using my first example, even when one uses '^(>(.*))$' one cannot get ['', '>Homo sapiens catenin (cadherin-associated)', ''] as one will get a four-element list and need to deal with

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread R. David Murray
R. David Murray added the comment: As you pointed out, you can already get that behavior by enclosing the entire split expression in a group. I don't see that there is any functionality missing here. -- ___ Python tracker

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Tomasz J. Kotarba
Tomasz J. Kotarba added the comment: I agree that introducing an example like that plus making some slight changes in wording would be a welcome change to the docs to clearly explain the current behaviour. Still, I maintain it would be useful to give users the option I described to allow them

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread R. David Murray
R. David Murray added the comment: >>> re.split('-', 'abc-def-jlk') ['abc', 'def', 'jlk'] >>> re.split('(-)', 'abc-def-jlk') ['abc', '-', 'def', '-', 'jlk'] Does that make it a bit clearer? Maybe we need an actual example in the docs. -- assignee: -> docs@python components: +Documenta

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Tomasz J. Kotarba
Tomasz J. Kotarba added the comment: Marking as open till I get your response. I hope you reconsider. -- resolution: invalid -> status: closed -> open ___ Python tracker ___ __

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Tomasz J. Kotarba
Tomasz J. Kotarba added the comment: Hi R. David Murray, Thanks for your reply. I just explained in my previous message to Matthew that documentation does actually support my view (i.e. it is an issue according to the documentation). Re. the issue you mentioned (discarding information concer

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Tomasz J. Kotarba
Tomasz J. Kotarba added the comment: Hi Matthew, Thanks for such a quick reply. I know I can get the > by putting it in grouping parentheses. That's not the issue here. The documentation you quoted says that it splits the string by the occurrences _OF_PATTERN_ and that texts of all groups

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread R. David Murray
R. David Murray added the comment: Thanks for the report, but as Matt said it doesn't look like there is any bug here. The behavior you report is what the docs say it is, and it seems to me that your "most useful" suggestion would discard the information about the group match, making specifyi

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Matthew Barnett
Matthew Barnett added the comment: It's not a bug. The documentation says """Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.""" You're splitting on r'^>(.*)$',

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Tomasz J. Kotarba
New submission from Tomasz J. Kotarba: Tested in 2.7 but possibly affects the other versions as well. A real life example (note the first character '>' being lost): >>> import re >>> re.split(r'^>(.*)$', '>Homo sapiens catenin (cadherin-associated)') produces: ['', 'Homo sapiens catenin (cadh