Thanks everyone for the replies all worked well, I adopted the string
splitting approach in favour of the regex one as it seemed to miss less
of the edge cases. I would like to thank everyone for their help once
again 




-----Original Message-----
From: Kent Johnson [mailto:[EMAIL PROTECTED] 
Sent: 27 June 2007 14:55
To: tutor@python.org; Gardner, Dean
Subject: Re: [Tutor] Regular Expression help

Gardner, Dean wrote:
> Hi
> 
> I have a text file that I would like to split up so that I can use it 
> in Excel to filter a certain field. However as it is a flat text file 
> I need to do some processing on it so that Excel can correctly import
it.
> 
> File Example:
> tag             desc                    VR      VM
> (0012,0042) Clinical Trial Subject Reading ID LO 1
> (0012,0050) Clinical Trial Time Point ID LO 1
> (0012,0051) Clinical Trial Time Point Description ST 1
> (0012,0060) Clinical Trial Coordinating Center Name LO 1
> (0018,0010) Contrast/Bolus Agent LO 1
> (0018,0012) Contrast/Bolus Agent Sequence SQ 1
> (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
> (0018,0015) Body Part Examined CS 1
> 
> What I essentially want is to use python to process this file to give 
> me
> 
> 
> (0012,0042); Clinical Trial Subject Reading ID; LO; 1 (0012,0050); 
> Clinical Trial Time Point ID; LO; 1 (0012,0051); Clinical Trial Time 
> Point Description; ST; 1 (0012,0060); Clinical Trial Coordinating 
> Center Name; LO; 1 (0018,0010); Contrast/Bolus Agent; LO; 1 
> (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1 (0018,0014); 
> Contrast/Bolus Administration Route Sequence; SQ; 1 (0018,0015); Body 
> Part Examined; CS; 1
> 
> so that I can import to excel using a delimiter.
> 
> This file is extremely long and all I essentially want to do is to 
> break it into it 'fields'
> 
> Now I suspect that regular expressions are the way to go but I have 
> only basic experience of using these and I have no idea what I should
be doing.

This seems to work:

data = '''\
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1'''.splitlines()

import re
fieldsRe = re.compile(r'^(\(\d+,\d+\)) (.*?) (\w+) (\d+)$')

for line in data:
    match = fieldsRe.match(line)
    if match:
        print ';'.join(match.group(1, 2, 3, 4))


I don't think you want the space after the ; that you put in your
example; Excel wants a single-character delimiter.

Kent


DISCLAIMER:
Unless indicated otherwise, the information contained in this message is 
privileged and confidential, and is intended only for the use of the 
addressee(s) named above and others who have been specifically authorized to 
receive it. If you are not the intended recipient, you are hereby notified that 
any dissemination, distribution or copying of this message and/or attachments 
is strictly prohibited. The company accepts no liability for any damage caused 
by any virus transmitted by this email. Furthermore, the company does not 
warrant a proper and complete transmission of this information, nor does it 
accept liability for any delays. If you have received this message in error, 
please contact the sender and delete the message. Thank you.
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to