Multiline regex

2010-07-21 Thread Brandon Harris
I'm trying to read in and parse an ascii type file that contains information that can span several lines. Example: createNode animCurveTU -n test:master_globalSmooth; setAttr .tan 9; setAttr -s 4 .ktv[0:3] 101 0 163 0 169 0 201 0; setAttr -s 4 .kit[3] 10; setAttr -s 4 .kot[3] 10;

Re: Multiline regex

2010-07-21 Thread Rodrick Brown
Slurp the entire file into a string and pick out the fields you need. Sent from my iPhone 4. On Jul 21, 2010, at 10:42 AM, Brandon Harris brandon.har...@reelfx.com wrote: I'm trying to read in and parse an ascii type file that contains information that can span several lines. Example:

Re: Multiline regex

2010-07-21 Thread Brandon Harris
what do you mean by slurp the entire file? I'm trying to use regular expressions because line by line parsing will be too slow. And example file would have somewhere in the realm of 6 million lines of code. Brandon L. Harris Rodrick Brown wrote: Slurp the entire file into a string and pick

Re: Multiline regex

2010-07-21 Thread Eknath Venkataramani
On Wed, Jul 21, 2010 at 8:12 PM, Brandon Harris brandon.har...@reelfx.comwrote: I'm trying to read in and parse an ascii type file that contains information that can span several lines. Do you have to use only regex? If not, I'd certainly suggest 'pyparsing'. It's a pleasure to use and very

Re: Multiline regex

2010-07-21 Thread Brandon Harris
At the moment I'm trying to stick with built in python modules to create tools for a much larger pipeline on multiple OSes. Brandon L. Harris Eknath Venkataramani wrote: On Wed, Jul 21, 2010 at 8:12 PM, Brandon Harris brandon.har...@reelfx.com mailto:brandon.har...@reelfx.com wrote:

RE: Multiline regex

2010-07-21 Thread Andreas Tawn
I'm trying to read in and parse an ascii type file that contains information that can span several lines. Example: createNode animCurveTU -n test:master_globalSmooth; setAttr .tan 9; setAttr -s 4 .ktv[0:3] 101 0 163 0 169 0 201 0; setAttr -s 4 .kit[3] 10; setAttr -s 4

Re: Multiline regex

2010-07-21 Thread Peter Otten
Brandon Harris wrote: I'm trying to read in and parse an ascii type file that contains information that can span several lines. Example: createNode animCurveTU -n test:master_globalSmooth; setAttr .tan 9; setAttr -s 4 .ktv[0:3] 101 0 163 0 169 0 201 0; setAttr -s 4 .kit[3]

Re: RE: Multiline regex

2010-07-21 Thread Brandon Harris
I could make it that simple, but that is also incredibly slow and on a file with several million lines, it takes somewhere in the league of half an hour to grab all the data. I need this to grab data from many many file and return the data quickly. Brandon L. Harris Andreas Tawn wrote: I'm

RE: RE: Multiline regex

2010-07-21 Thread Andreas Tawn
I could make it that simple, but that is also incredibly slow and on a file with several million lines, it takes somewhere in the league of half an hour to grab all the data. I need this to grab data from many many file and return the data quickly. Brandon L. Harris That's surprising. I

Re: Multiline regex

2010-07-21 Thread Brandon Harris
Could it be that there isn't just that type of data in the file? there are many different types, that is just one that I'm trying to grab. Brandon L. Harris Andreas Tawn wrote: I could make it that simple, but that is also incredibly slow and on a file with several million lines, it takes

RE: Multiline regex

2010-07-21 Thread Andreas Tawn
I could make it that simple, but that is also incredibly slow and on a file with several million lines, it takes somewhere in the league of half an hour to grab all the data. I need this to grab data from many many file and return the data quickly. Brandon L. Harris That's surprising. I

Re: Multiline regex

2010-07-21 Thread Jeremy Sanders
Brandon Harris wrote: I'm trying to read in and parse an ascii type file that contains information that can span several lines. Example: What about something like this (you need re.MULTILINE): In [16]: re.findall('^([^ ].*\n([ ].*\n)+)', a, re.MULTILINE) Out[16]: [('createNode animCurveTU

Re: Multiline regex

2010-07-21 Thread Steven D'Aprano
On Wed, 21 Jul 2010 10:06:14 -0500, Brandon Harris wrote: what do you mean by slurp the entire file? I'm trying to use regular expressions because line by line parsing will be too slow. And example file would have somewhere in the realm of 6 million lines of code. And you think trying to run

Multiline regex help

2005-03-03 Thread Yatima
Hey Folks, I've got some info in a bunch of files that kind of looks like so: Gibberish 53 MoreGarbage 12 RelevantInfo1 10/10/04 NothingImportant ThisDoesNotMatter 44 RelevantInfo2 22 BlahBlah 343 RelevantInfo3 23 Hubris Crap 34 and so on... Anyhow, these fields repeat several times in a given

Re: Multiline regex help

2005-03-03 Thread Kent Johnson
Yatima wrote: Hey Folks, I've got some info in a bunch of files that kind of looks like so: Gibberish 53 MoreGarbage 12 RelevantInfo1 10/10/04 NothingImportant ThisDoesNotMatter 44 RelevantInfo2 22 BlahBlah 343 RelevantInfo3 23 Hubris Crap 34 and so on... Anyhow, these fields repeat several times

Re: Multiline regex help

2005-03-03 Thread Steven Bethard
Yatima wrote: Hey Folks, I've got some info in a bunch of files that kind of looks like so: Gibberish 53 MoreGarbage 12 RelevantInfo1 10/10/04 NothingImportant ThisDoesNotMatter 44 RelevantInfo2 22 BlahBlah 343 RelevantInfo3 23 Hubris Crap 34 and so on... Anyhow, these fields repeat several times

Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard [EMAIL PROTECTED] wrote: A possible solution, using the re module: py s = \ ... Gibberish ... 53 ... MoreGarbage ... 12 ... RelevantInfo1 ... 10/10/04 ... NothingImportant ... ThisDoesNotMatter ... 44 ... RelevantInfo2 ... 22 ...

Re: Multiline regex help

2005-03-03 Thread James Stroud
Have a look at martel, part of biopython. The world of bioinformatics is filled with files with structure like this. http://www.biopython.org/docs/api/public/Martel-module.html James On Thursday 03 March 2005 12:03 pm, Yatima wrote: On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard [EMAIL

Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 07:14:50 -0500, Kent Johnson [EMAIL PROTECTED] wrote: Here is a way to create a list of [RelevantInfo, value] pairs: import cStringIO raw_data = '''Gibberish 53 MoreGarbage 12 RelevantInfo1 10/10/04 NothingImportant ThisDoesNotMatter 44 RelevantInfo2 22 BlahBlah

Re: Multiline regex help

2005-03-03 Thread James Stroud
I found the original paper for Martel: http://www.dalkescientific.com/Martel/ipc9/ On Thursday 03 March 2005 12:26 pm, James Stroud wrote: Have a look at martel, part of biopython. The world of bioinformatics is filled with files with structure like this.

Re: Multiline regex help

2005-03-03 Thread Steven Bethard
Yatima wrote: On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard [EMAIL PROTECTED] wrote: A possible solution, using the re module: py s = \ ... Gibberish ... 53 ... MoreGarbage ... 12 ... RelevantInfo1 ... 10/10/04 ... NothingImportant ... ThisDoesNotMatter ... 44 ... RelevantInfo2 ... 22 ...

Re: Multiline regex help

2005-03-03 Thread Kent Johnson
Here is another attempt. I'm still not sure I understand what form you want the data in. I made a dict - dict - list structure so if you lookup e.g. scores['10/11/04']['60'] you get a list of all the RelevantInfo2 values for Relevant1='10/11/04' and Relevant2='60'. The parser is a simple-minded

Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 16:25:39 -0500, Kent Johnson [EMAIL PROTECTED] wrote: Here is another attempt. I'm still not sure I understand what form you want the data in. I made a dict - dict - list structure so if you lookup e.g. scores['10/11/04']['60'] you get a list of all the RelevantInfo2

Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 3 Mar 2005 12:26:37 -0800, James Stroud [EMAIL PROTECTED] wrote: Have a look at martel, part of biopython. The world of bioinformatics is filled with files with structure like this. http://www.biopython.org/docs/api/public/Martel-module.html James Thanks for the link. Steve and

Re: Multiline regex help

2005-03-03 Thread Steven Bethard
Kent Johnson wrote: for line in raw_data: if line.startswith('RelevantInfo1'): info1 = raw_data.next().strip() elif line.startswith('RelevantInfo2'): info2 = raw_data.next().strip() elif line.startswith('RelevantInfo3'): info3 = raw_data.next().strip()

Re: Multiline regex help

2005-03-03 Thread Kent Johnson
Steven Bethard wrote: Kent Johnson wrote: for line in raw_data: if line.startswith('RelevantInfo1'): info1 = raw_data.next().strip() elif line.startswith('RelevantInfo2'): info2 = raw_data.next().strip() elif line.startswith('RelevantInfo3'): info3 =