I am creating a taverna workflow for doing nucleotide sequence analysis on mayflies. I am trying to add some logic to my script to get a good idea of which sequences for different taxa are homologous (I can't query homologene because they don't have any sequences for mayflies.) Anyway I get the fasta sequences with esearch and at the top of each fasta sequence I get a title of the gene like this: Potamanthellus caenoides voucher BYU:IGCEP220 28S ribosomal RNA gene, partial sequence
Another is like this: Paraleptophlebia submarginata voucher BYU:IGCEP243 28S ribosomal RNA gene, partial sequence Ideally, what I want is some way to get just the name of the gene ie: "28S ribosomal RNA gene" That way it would be easier to know that these 2 sequences are probably homologous. I have written some logic to try to parse this actual name out but it isn't 100% reliable. So, I was wondering if anyone knows if there is some way to get the name of the gene instead of this whole header with codes for the school and what not. Thanks in advance!!! ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ taverna-users mailing list [email protected] [email protected] Web site: http://www.taverna.org.uk Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/
