I think I have a solution. File ############################ (0012,0042) Clinical Trial Subject Reading ID LO 1 (0012,0050) Clinical Trial Time Point ID LO 1 (0012,0051) Clinical Trial Time Point Description ST 1 (0012,0060) Clinical Trial Coordinating Center Name LO 1 (0018,0010) Contrast/Bolus Agent LO 1 (0018,0012) Contrast/Bolus Agent Sequence SQ 1 (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1 (0018,0015) Body Part Examined CS 1
Script ############################# #!/usr/bin/python import re #matchstr regex flow # (\(\d+,\d+\)) # (0018,0014) # \s # [space] # (..*) # Contrast/Bolus Administration Route Sequence # \s # space # ([a-z]{2}) # SQ - two letters and no more # \s # [space] # (\d) # 1 - single digit # re.I) # case insensitive matchstr = re.compile(r"(\(\d+,\d+\))\s(..*)\s([a-z]{2})\s(\d)",re.I) myfile = open('/tmp/file','r') for line in myfile.readlines(): regex_match = matchstr.match(line) if regex_match: print regex_match.group(1) + ";" + regex_match.group(2) + ";" + regex_match.group(3) + ";" + regex_match.group(4) Output ##################### (0012,0042);Clinical Trial Subject Reading ID;LO;1 (0012,0050);Clinical Trial Time Point ID;LO;1 (0012,0051);Clinical Trial Time Point Description;ST;1 (0012,0060);Clinical Trial Coordinating Center Name;LO;1 (0018,0010);Contrast/Bolus Agent;LO;1 (0018,0012);Contrast/Bolus Agent Sequence;SQ;1 (0018,0014);Contrast/Bolus Administration Route Sequence;SQ;1 (0018,0015);Body Part Examined;CS;1 On 6/27/07, Gardner, Dean <[EMAIL PROTECTED]> wrote:
Hi I have a text file that I would like to split up so that I can use it in Excel to filter a certain field. However as it is a flat text file I need to do some processing on it so that Excel can correctly import it. File Example: tag desc VR VM (0012,0042) Clinical Trial Subject Reading ID LO 1 (0012,0050) Clinical Trial Time Point ID LO 1 (0012,0051) Clinical Trial Time Point Description ST 1 (0012,0060) Clinical Trial Coordinating Center Name LO 1 (0018,0010) Contrast/Bolus Agent LO 1 (0018,0012) Contrast/Bolus Agent Sequence SQ 1 (0018,0014) Contrast/Bolus Administration Route Sequence SQ 1 (0018,0015) Body Part Examined CS 1 What I essentially want is to use python to process this file to give me (0012,0042); Clinical Trial Subject Reading ID; LO; 1 (0012,0050); Clinical Trial Time Point ID; LO; 1 (0012,0051); Clinical Trial Time Point Description; ST; 1 (0012,0060); Clinical Trial Coordinating Center Name; LO; 1 (0018,0010); Contrast/Bolus Agent; LO; 1 (0018,0012); Contrast/Bolus Agent Sequence; SQ ;1 (0018,0014); Contrast/Bolus Administration Route Sequence; SQ; 1 (0018,0015); Body Part Examined; CS; 1 so that I can import to excel using a delimiter. This file is extremely long and all I essentially want to do is to break it into it 'fields' Now I suspect that regular expressions are the way to go but I have only basic experience of using these and I have no idea what I should be doing. Can anyone help. Thanks DISCLAIMER: Unless indicated otherwise, the information contained in this message is privileged and confidential, and is intended only for the use of the addressee(s) named above and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message and/or attachments is strictly prohibited. The company accepts no liability for any damage caused by any virus transmitted by this email. Furthermore, the company does not warrant a proper and complete transmission of this information, nor does it accept liability for any delays. If you have received this message in error, please contact the sender and delete the message. Thank you. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor