Hi Richard,
this is great. thank you for the patch

a

On Tue, Sep 27, 2011 at 8:04 PM, Richard Hayes <[email protected]> wrote:

> Hi all,
>
> Our group has previously written to the list after noticing a bug in
> Biomart 0.6 for sequence queries involving exons, such as CDS FASTA
> retrieval. Especially when requesting an entire transcriptome/proteome/etc.,
> a handful of sequences were returned either with exons missing entirely or
> split across multiple FASTA entries.
>
> With the help of Junjun Zhang, we were able to trace this to a problem with
> how BioMart versions 0.6 and 0.7 attempt to combine the results of batched
> SQL queries (e.g., the problem doesn't occur at all if one completely
> disables batching, which unfortunately introduces a significant performance
> hit). I have found a fix for the DatasetI.pm module. Essentially, hash keys
> were not sorted during the previous and current batch dataset attribute
> merger step, causing improper handling of transcript data when exons
> happened by chance to be split between SQL query batches.
>
> The attached patch file has been tested successfully on both version 0.6
> and version 0.7. Also, I have been able to successfully return correct,
> complete FASTA data files without any special filter/attribute orderBy
> constraints. This may be a quirk of our database, as exons for each
> transcript are batch loaded in the correct exon_rank order. If that is not
> the case for your data, you should, in addition to applying this patch, use
> orderBy constraints of "transcript_id_key, exon_rank" (similar to the Ensembl
> gene dataset configuration) on the "coding" and "peptide" structure
> exportables in your configuration.
>
> Best regards,
>
> --
> Richard D. Hayes, Ph.D.
> Joint Genome Institute / Lawrence Berkeley National Lab
> http://www.phytozome.net
>
> _______________________________________________
> Users mailing list
> [email protected]
> https://lists.biomart.org/mailman/listinfo/users
>
>
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to