On Wed, 7 Dec 2005, ps python wrote:
> I am a new python learner. i am trying to parse a file using regular > expressions. Hello, Just as an aside: parsing Genbank flat files like this is not such a good idea, because you can get the Genbank XML files instead. For example, your locus NM_005417 has a perfectly good XML representation (using the GBSeq XML format): http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&qty=1&c_start=1&list_uids=38202215&dopt=gbx&dispmax=5&sendto= This format contains the same content as the human-readable text report, but structured in a way that makes it easier to extract elements if we use an XML parser like ElementTree. http://effbot.org/zone/element-index.htm And even if that weren't avaliable, we might also consider using the parsers that come with the BioPython project: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/genbank/ http://www.biopython.org/docs/tutorial/Tutorial004.html#toc13 I guess I'm trying to say: you might not want to reinvent the wheel: it's been done several times already. *grin* If you're doing this to learn regular expressions, that's fine too. Just be aware that those other modules are out there. Let's look at the code. > for line in dat: > a = pat1.match(line) > b = pat2.match(line) > c = pat3.match(line) > d = pat4.match(line) Use the search() method, not the match() method. match() always assumes that the match must start at the very beginning of the line, and it'll miss things if your pattern is in the middle somewhere. There's a discussion about this in the Regex HOWTO: http://www.amk.ca/python/howto/regex/ http://www.amk.ca/python/howto/regex/regex.html#SECTION000720000000000000000 If you have more questions, please feel free to ask. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor