Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Martin MOKREJŠ
Keiran Raine wrote: > >> On 2 Aug 2016, at 15:31, Martin MOKREJŠ > > wrote: >> >> "samtools calmd" is the first tool in my pipeline writing out the MD: tag, >> but as it turned out this is more about NM: tag being mistaken, and that >> was bwa mem's job. (I don't even k

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Keiran Raine
> On 2 Aug 2016, at 15:31, Martin MOKREJŠ wrote: > > "samtools calmd" is the first tool in my pipeline writing out the MD: tag, > but as it turned out this is more about NM: tag being mistaken, and that > was bwa mem's job. (I don't even know how to tell bwa mem to output also MD: > tag at the f

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Martin MOKREJŠ
James Bonfield wrote: > On Tue, Aug 02, 2016 at 01:44:01PM +0200, Martin MOKREJ? wrote: >>> The warnings are there because it is correcting the aligner output. >>> The proper fix is to fix the aligners to produce the correct MD in the >>> first place, not to break calmd to be buggy in the same man

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread James Bonfield
On Tue, Aug 02, 2016 at 01:44:01PM +0200, Martin MOKREJ? wrote: > >The warnings are there because it is correcting the aligner output. > >The proper fix is to fix the aligners to produce the correct MD in the > >first place, not to break calmd to be buggy in the same manner. > > Hmm, but bwa mem d

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Keiran Raine
Hi Martin, Adding 'calmdnmrecompindetonly=1' will increase performance further as it will only recompute the MD/NM values if the reference section has ambiguity/N within the span of the reads. Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institut

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Martin MOKREJŠ
Heng Li wrote: > samtools calmd does not really work with name sorted files. It spent all the > 12 hours to repeatedly load reference fasta. Hmm, although I did it in the pipeline originally in these steps: samtools sort -@ $xthreads -n -m 4G -O bam -T "$sample" -o "$sample".realignedtogether.B

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Heng Li
samtools calmd does not really work with name sorted files. It spent all the 12 hours to repeatedly load reference fasta. Heng On Aug 2, 2016, at 9:47 AM, Martin MOKREJŠ wrote: > Keiran Raine wrote: >> For BAM in/out yes: >> >> inputthreads=<[1]>: input helper threads (for inputfo

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Martin MOKREJŠ
Keiran Raine wrote: > For BAM in/out yes: > > inputthreads=<[1]>: input helper threads (for inputformat=bam > only, default: 1) > outputthreads=<[1]> : output helper threads (for outputformat=bam > only, default: 1) bamsort fixmates=1 calmdnm=1 calmdnmreference="$reference"

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Keiran Raine
For BAM in/out yes: inputthreads=<[1]>: input helper threads (for inputformat=bam only, default: 1) outputthreads=<[1]> : output helper threads (for outputformat=bam only, default: 1) Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Inst

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Martin MOKREJŠ
Opened as https://github.com/lh3/bwa/issues/82 , feel free to chim in. -- ___ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/l

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Martin MOKREJŠ
Hi Keiran, cool, does it run in multiple threads? "samtools clamd" runs in a single thread in my hands, so this would certainly be a reason to change my pipeline. Keiran Raine wrote: > Hi James, Martin, > > As James indicates BWA doesn't actually report the the correct MD tags as it > changes

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Keiran Raine
Hi James, Martin, As James indicates BWA doesn't actually report the the correct MD tags as it changes all ambiguity and N positions to ACGT (by some reproducible method). I had requested addition of functionality to biobambam(2) to recalculate the MD if the read spanned any ambiguity/N positio

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Martin MOKREJŠ
James Bonfield wrote: > On Tue, Aug 02, 2016 at 11:57:52AM +0200, Martin MOKREJ? wrote: >> Could samtools calmd apply the following logic for bwa-processed input? >> Get positions of all N's in the read. >> Do not complain about those positions which are based on N's. >> Do report other position

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread James Bonfield
On Tue, Aug 02, 2016 at 11:57:52AM +0200, Martin MOKREJ? wrote: > Could samtools calmd apply the following logic for bwa-processed input? > Get positions of all N's in the read. > Do not complain about those positions which are based on N's. > Do report other positions. I don't entirely underst

Re: [Samtools-help] samtools calmd behavior and MD tag

2016-08-02 Thread Martin MOKREJŠ
Hi James, thank you for a detailed analysis. James Bonfield wrote: > On Thu, Jul 28, 2016 at 10:53:53AM +0200, Martin MOKREJ? wrote: >> $ samtools view dedup.realigned.calmd.bam | grep >> HWI-x:xxx:x:1:1106:17285:41358 >> HWI-x:xxx:x:1:1106:17285:41358 145 1