I think I have a solution.

File
############################
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1


Script
#############################
#!/usr/bin/python

import re

#matchstr regex flow
# (\(\d+,\d+\))     # (0018,0014)
# \s                   # [space]
# (..*)                # Contrast/Bolus Administration Route Sequence
# \s                   # space
# ([a-z]{2})         # SQ - two letters and no more
# \s                  # [space]
# (\d)                # 1 - single digit
# re.I)               # case insensitive

matchstr = re.compile(r"(\(\d+,\d+\))\s(..*)\s([a-z]{2})\s(\d)",re.I)
myfile = open('/tmp/file','r')

for line in myfile.readlines():
       regex_match = matchstr.match(line)
       if regex_match:
               print regex_match.group(1) + ";" + regex_match.group(2) +
";" + regex_match.group(3) + ";" + regex_match.group(4)


Output
#####################
(0012,0042);Clinical Trial Subject Reading ID;LO;1
(0012,0050);Clinical Trial Time Point ID;LO;1
(0012,0051);Clinical Trial Time Point Description;ST;1
(0012,0060);Clinical Trial Coordinating Center Name;LO;1
(0018,0010);Contrast/Bolus Agent;LO;1
(0018,0012);Contrast/Bolus Agent Sequence;SQ;1
(0018,0014);Contrast/Bolus Administration Route Sequence;SQ;1
(0018,0015);Body Part Examined;CS;1


On 6/27/07, Gardner, Dean <[EMAIL PROTECTED]> wrote:

 Hi

I have a text file that I would like to split up so that I can use it in
Excel to filter a certain field. However as it is a flat text file I need to
do some processing on it so that Excel can correctly import it.

File Example:
tag             desc                    VR      VM
(0012,0042) Clinical Trial Subject Reading ID LO 1
(0012,0050) Clinical Trial Time Point ID LO 1
(0012,0051) Clinical Trial Time Point Description ST 1
(0012,0060) Clinical Trial Coordinating Center Name LO 1
(0018,0010) Contrast/Bolus Agent LO 1
(0018,0012) Contrast/Bolus Agent Sequence SQ 1
(0018,0014) Contrast/Bolus Administration Route Sequence SQ 1
(0018,0015) Body Part Examined CS 1

What I essentially want is to use python to process this file to give me

(0012,0042); Clinical Trial Subject Reading ID; LO; 1
(0012,0050); Clinical Trial Time Point ID; LO; 1
(0012,0051); Clinical Trial Time Point Description; ST; 1
(0012,0060); Clinical Trial Coordinating Center Name; LO; 1
(0018,0010); Contrast/Bolus Agent; LO; 1
(0018,0012); Contrast/Bolus Agent Sequence; SQ ;1
(0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1
(0018,0015); Body Part Examined; CS; 1

so that I can import to excel using a delimiter.

This file is extremely long and all I essentially want to do is to break
it into it 'fields'

Now I suspect that regular expressions are the way to go but I have only
basic experience of using these and I have no idea what I should be doing.

Can anyone help.

Thanks

DISCLAIMER:
Unless indicated otherwise, the information contained in this message is
privileged and confidential, and is intended only for the use of the
addressee(s) named above and others who have been specifically authorized to
receive it. If you are not the intended recipient, you are hereby notified
that any dissemination, distribution or copying of this message and/or
attachments is strictly prohibited. The company accepts no liability for any
damage caused by any virus transmitted by this email. Furthermore, the
company does not warrant a proper and complete transmission of this
information, nor does it accept liability for any delays. If you have
received this message in error, please contact the sender and delete the
message. Thank you.

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to