hi, I am a new python learner. i am trying to parse a file using regular expressions.
LOCUS NM_005417 4145 bp mRNA linear PRI 04-DEC-2005 DEFINITION Homo sapiens v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian) (SRC), transcript variant 1, mRNA. ACCESSION NM_005417 VERSION NM_005417.3 GI:38202215 CDS 450..2060 /gene="SRC" /EC_number="2.7.1.112" /note="v-src avian sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog; protooncogene SRC, Rous sarcoma; tyrosine-protein kinase SRC-1; tyrosine kinase pp60c-src; go_component: ubiquitin ligase complex [goid 0000151] [evidence NAS] [pmid 14976165]; go_function: ATP binding [goid 0005524] [evidence IEA]; go_function: nucleotide binding [goid 0000166] [evidence IEA]; go_function: protein binding [goid 0005515] [evidence IPI] [pmid 15546919]; go_function: protein binding [goid 0005515] [evidence IPI] [pmid 15749833]; go_function: transferase activity [goid 0016740] [evidence IEA]; go_function: SH3/SH2 adaptor activity [goid 0005070] [evidence TAS] [pmid 9020193]; go_function: protein-tyrosine kinase activity [goid 0004713] [evidence IEA]; go_function: protein-tyrosine kinase activity [goid 0004713] [evidence TAS] [pmid 9020193]; go_process: protein kinase cascade [goid 0007243] [evidence TAS] [pmid 9020193]; go_process: signal complex formation [goid 0007172] [evidence TAS] [pmid 9924018]; go_process: protein amino acid phosphorylation [goid 0006468] [evidence IEA]" /codon_start=1 /product="proto-oncogene tyrosine-protein kinase SRC" I want to pullout LOCUS, go_process, go_function, go_compartment into columns so that i can fill my postgres tables. although i know well to some extenet sql stuff, i am finding it difficult to get these elements from this raw text. an idea that might have worked however, due to incompetence it did not work. i seek help of this forum. import re f1 = open('this_raw_text','r') dat = f1.readlines() pat1 = re.compile('LOCUS') pat2 = re.compile('go_component') pat3 = re.compile('go_process') pat4 = re.compile('go_function') for line in dat: a = pat1.match(line) b = pat2.match(line) c = pat3.match(line) d = pat4.match(line) loc = a.group(1) comp =b.group(1) proc = c. group(1) func = d. group(1) print loc+'\t'+func print loc+'\t'+comp print loc+'\t'+func print loc+'\t'+proc Traceback (most recent call last): File "<pyshell#12>", line 6, in ? loc = a.group(1) IndexError: no such group In the second attempt: >>>for line in dat: a = pat1.match(line) b = pat2.match(line) c = pat3.match(line) d = pat4.match(line) print a, b, c, d <_sre.SRE_Match object at 0x00D53330> None None None rest all lines are NONE. However, b,c,d are there in the file. I do not know if this is the correct procedure. Can any one help me please. thanks a lot. M __________________________________________________________ Yahoo! India Matrimony: Find your partner now. Go to http://yahoo.shaadi.com _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor