[issue22817] re.split fails with lookahead/behind

2015-03-02 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: re.split() with the r'(?CA)(?=GCTG)' pattern raises a ValueError in 3.5 (see issue22818). In future releases it could be changed to work with zero-width patterns (such as lookaround assertions). -- resolution: - wont fix stage: - resolved status:

[issue22817] re.split fails with lookahead/behind

2014-11-08 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: It is possible to change this behavior (see example patch). With this patch: re.split(r'(?=CA)(?=GCTG)', 'ACGTCAGCTGAAAAGCTGACGTACGT') ['ACGTCA', 'GCTGAAAA', 'GCTGACGTACGT'] re.split(r'\b', the quick, brown fox) ['', 'the', ' ', 'quick', ', ',

[issue22817] re.split fails with lookahead/behind

2014-11-08 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Previous attempts to solve this issue: issue852532, issue988761, issue3262. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22817 ___

[issue22817] re.split fails with lookahead/behind

2014-11-07 Thread Rex Dwyer
New submission from Rex Dwyer: I would like to split a DNA sequence with a restriction enzyme. A description enzyme can be describe as, e.g. r'(?CA)(?=GCTG)' I cannot get re.split to split on this pattern as perl 5 does. -- components: Regular Expressions messages: 230831 nosy:

[issue22817] re.split fails with lookahead/behind

2014-11-07 Thread Ezio Melotti
Ezio Melotti added the comment: Can you provide a sample DNA sequence (or part of it), the exact code you used, the output you got, and what you expected? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22817

[issue22817] re.split fails with lookahead/behind

2014-11-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: re.split(r'(?=CA)(?=GCTG)', 'CAGCTG') ['CAGCTG'] I think expected output is ['CA', 'GCTG']. -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22817

[issue22817] re.split fails with lookahead/behind

2014-11-07 Thread Rex Dwyer
Rex Dwyer added the comment: sorry if I wasn't clear. s = 'ACGTCAGCTGAAAAGCTGACGTACGT re.split(r'(?CA)(?=GCTG)',s) expected output is: acgtCA|GCTGaaacccCA|GCTGacgtacgt - ['ACGTCA', 'GCTGAAAA', 'GCTGACGTACGT'] I would also be able to split a text on word boundaries: re.split(r'\b', the

[issue22817] re.split fails with lookahead/behind

2014-11-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: This looks as one of existing issue about zero-length matches (issue1647489, issue10328). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22817 ___