Hi Ivan, On Tue, Nov 08, 2016 at 01:20:23PM +0100, Andreas Tille wrote: > > When aligning and comparing two sequences using dnadiff, where one is an > identical (or very similar) subset of another, we get strange results > (however not on all genomes). > For example, taking only the first chromosome of the S. Cerevisiae S288c > reference and comparing it to the entire reference: > dnadiff saccharomyces_cerevisiae-S288C.fa chr1.fa > > Results with (in version version 3.23~dfsg-3): > ... > [Bases] > TotalBases 12071326 230218 > AlignedBases 572185(4.74%) 51864(22.53%) > UnalignedBases 11499141(95.26%) 178354(77.47%) > ... > AvgIdentity 97.65 97.65 > ... > > Whereas in version 3.23~dfsg-2 we get something that's more expected: > ... > [Bases] > TotalBases 12071326 230218 > AlignedBases 572185(4.74%) 230218(100.00%) > UnalignedBases 11499141(95.26%) 0(0.00%) > ... > AvgIdentity 100.00 100.00 > ...
That's a very helpful observation. My first suspicion is that the patches I've taken over from mugsy that was featuring a code copy of mummer are responsible for the diff. I'd like to wait for some comments from the Debian Med team. In any case I can confirm that also the version currently in Debian testing (3.23+dfsg-1) which is the release candidate for the next stable distribution, is reproducing the issue: $ dnadiff saccharomyces_cerevisiae-S288C.fa chr1.fa $ grep -A3 '^\[Bases\]' out.report [Bases] TotalBases 12071326 230218 AlignedBases 572185(4.74%) 51931(22.56%) UnalignedBases 11499141(95.26%) 178287(77.44%) > We found this to happen in de novo assemblies of the W303 PacBio dataset, > as well as on a C. Elegans dataset, but not on E. Coli. > > I'm attaching the sample reference I used in the example above. That's very helpful to reproduce the issue. > P.S. The problem does not happen with the official Mummer 3.23 SourceForge > code. Also a helpful hint which is targeting in the same direction as my assumption that the code diff between 3.23~dfsg-2 and 3.23~dfsg-3 might be responsible and should most probably be reverted. Kind regards Andreas. -- http://fam-tille.de