Hi Richard, this is great. thank you for the patch a
On Tue, Sep 27, 2011 at 8:04 PM, Richard Hayes <[email protected]> wrote: > Hi all, > > Our group has previously written to the list after noticing a bug in > Biomart 0.6 for sequence queries involving exons, such as CDS FASTA > retrieval. Especially when requesting an entire transcriptome/proteome/etc., > a handful of sequences were returned either with exons missing entirely or > split across multiple FASTA entries. > > With the help of Junjun Zhang, we were able to trace this to a problem with > how BioMart versions 0.6 and 0.7 attempt to combine the results of batched > SQL queries (e.g., the problem doesn't occur at all if one completely > disables batching, which unfortunately introduces a significant performance > hit). I have found a fix for the DatasetI.pm module. Essentially, hash keys > were not sorted during the previous and current batch dataset attribute > merger step, causing improper handling of transcript data when exons > happened by chance to be split between SQL query batches. > > The attached patch file has been tested successfully on both version 0.6 > and version 0.7. Also, I have been able to successfully return correct, > complete FASTA data files without any special filter/attribute orderBy > constraints. This may be a quirk of our database, as exons for each > transcript are batch loaded in the correct exon_rank order. If that is not > the case for your data, you should, in addition to applying this patch, use > orderBy constraints of "transcript_id_key, exon_rank" (similar to the Ensembl > gene dataset configuration) on the "coding" and "peptide" structure > exportables in your configuration. > > Best regards, > > -- > Richard D. Hayes, Ph.D. > Joint Genome Institute / Lawrence Berkeley National Lab > http://www.phytozome.net > > _______________________________________________ > Users mailing list > [email protected] > https://lists.biomart.org/mailman/listinfo/users > >
_______________________________________________ Users mailing list [email protected] https://lists.biomart.org/mailman/listinfo/users
