Hi Andreas,

You're looking at a comparison between two FASTQ reads:

    @Rec1
    GCATGATATATACAAC
    +
    012345'FBcEFGHIJ

and

    @Rec1
    GCATGATATATACAAC
    +
    012345(FBcEFGHIJ


The format is roughly
  
    @[name]
    [N * Nucleotides]
    +
    [N * Phred quality scores]


So the difference is that the quality score for the 7th base (a 'T') is "'" 
instead of "(", corresponding to a Phred quality score of 6 instead of 7 
(calculated as ASCII value - 33). It is merely a coincidence that this happened 
between numerical and non-numerical values.

During the collapse step, in which two overlapping reads are merged, updated 
quality scores are calculated as a product of the scores for the two copies of 
the overlapping base-pair. This will either result in a higher quality score 
being assigned (for identical positions) or a lower quality score being 
assigned (for mismatching positions). 

The calculation to determine the updated quality score makes use of std::log10, 
which appears to produce slightly different results on i386 vs amd64, resulting 
in a small number of updated quality scores being off by one on i386 compared 
to amd64. I determined this simply by printing every single intermediate result 
for this calculation for binaries built with and without -m32.

The solution I am taking is to simply pre-calculate the lookup table and 
include a hardcoded copy of that, instead populating the table when it is first 
used. It is not the prettiest solution, but it ensures that results can be 
reproduced regardless of the architecture.

I realize that my last comment was rather vague, but I hope that this makes it 
clear what the issue was.

See https://en.wikipedia.org/wiki/FASTQ_format for a detailed description of 
the format.

-- 
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
https://github.com/MikkelSchubert/adapterremoval/issues/35#issuecomment-471948966

Reply via email to