[Taverna-users] any reliable way to parse out the name of the gene when getting fasta sequences from NCBI?

mcookson Sun, 07 Mar 2010 19:21:39 -0800

I am creating a taverna workflow for doing nucleotide sequence analysis
on mayflies.  I am trying to add some logic to my script to get a good
idea of which sequences for different taxa are homologous (I can't query
homologene because they don't have any sequences for mayflies.)  Anyway
I get the fasta sequences with esearch and at the top of each fasta
sequence I get a title of the gene like this:
Potamanthellus caenoides voucher BYU:IGCEP220 28S ribosomal RNA gene,
partial sequence


Another is like this:
Paraleptophlebia submarginata voucher BYU:IGCEP243 28S ribosomal RNA
gene, partial sequence

Ideally, what I want is some way to get just the name of the gene ie:
"28S ribosomal RNA gene"
That way it would be easier to know that these 2 sequences are probably
homologous.  I have written some logic to try to parse this actual name
out but it isn't 100% reliable.  So, I was wondering if anyone knows if
there is some way to get the name of the gene instead of this whole
header with codes for the school and what not.  
Thanks in advance!!!


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
taverna-users mailing list
[email protected]
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/

[Taverna-users] any reliable way to parse out the name of the gene when getting fasta sequences from NCBI?

Reply via email to