sorry for repost. awaiting to hear from some members.
a scientist suggested me to try biopython. This point
is not just with genbank sequences. what will i do if
i have to parse a paragraph for some expression.
thanks again.
hi,
I am a new python learner. i am trying to parse a
file using regular expressions.
LOCUS NM_005417 4145 bp mRNA
linear PRI 04-DEC-2005
DEFINITION Homo sapiens v-src sarcoma (Schmidt-Ruppin
A-2) viral oncogene
homolog (avian) (SRC), transcript variant
1, mRNA.
ACCESSION NM_005417
VERSION NM_005417.3 GI:38202215
CDS 450..2060
/gene="SRC"
/EC_number="2.7.1.112"
/note="v-src avian sarcoma
(Schmidt-Ruppin A-2) viral
oncogene homolog; protooncogene
SRC, Rous sarcoma;
tyrosine-protein kinase SRC-1;
tyrosine kinase pp60c-src;
go_component: ubiquitin ligase
complex [goid 0000151]
[evidence NAS] [pmid 14976165];
go_function: ATP binding [goid
0005524] [evidence IEA];
go_function: nucleotide binding
[goid 0000166] [evidence
IEA];
go_function: protein binding
[goid 0005515] [evidence IPI]
[pmid 15546919];
go_function: protein binding
[goid 0005515] [evidence IPI]
[pmid 15749833];
go_function: transferase activity
[goid 0016740] [evidence
IEA];
go_function: SH3/SH2 adaptor
activity [goid 0005070]
[evidence TAS] [pmid 9020193];
go_function: protein-tyrosine
kinase activity [goid
0004713] [evidence IEA];
go_function: protein-tyrosine
kinase activity [goid
0004713] [evidence TAS] [pmid
9020193];
go_process: protein kinase
cascade [goid 0007243]
[evidence TAS] [pmid 9020193];
go_process: signal complex
formation [goid 0007172]
[evidence TAS] [pmid 9924018];
go_process: protein amino acid
phosphorylation [goid
0006468] [evidence IEA]"
/codon_start=1
/product="proto-oncogene
tyrosine-protein kinase SRC"
I want to pullout LOCUS, go_process, go_function,
go_compartment into columns so that i can fill my
postgres tables. although i know well to some extenet
sql stuff, i am finding it difficult to get these
elements from this raw text.
an idea that might have worked however, due to
incompetence it did not work. i seek help of this
forum.
import re
f1 = open('this_raw_text','r')
dat = f1.readlines()
pat1 = re.compile('LOCUS')
pat2 = re.compile('go_component')
pat3 = re.compile('go_process')
pat4 = re.compile('go_function')
for line in dat:
a = pat1.match(line)
b = pat2.match(line)
c = pat3.match(line)
d = pat4.match(line)
loc = a.group(1)
comp =b.group(1)
proc = c. group(1)
func = d. group(1)
print loc+'\t'+func
print loc+'\t'+comp
print loc+'\t'+func
print loc+'\t'+proc
Traceback (most recent call last):
File "<pyshell#12>", line 6, in ?
loc = a.group(1)
IndexError: no such group
In the second attempt:
>>>for line in dat:
a = pat1.match(line)
b = pat2.match(line)
c = pat3.match(line)
d = pat4.match(line)
print a, b, c, d
<_sre.SRE_Match object at 0x00D53330> None None None
rest all lines are NONE. However, b,c,d are there in
the file.
I do not know if this is the correct procedure.
Can any one help me please.
thanks a lot.
M
____________________________________________________
Yahoo! India Matrimony: Find your partner now. Go to http://yahoo.shaadi.com
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor