Hi, WebKittens! In the YouTube Media Source Extensions conformance tests there is one called 36.AppendOpusAudioOutOfOrder where two audio media segments are appended out of order to a SourceBuffer: First, a segment with the PTS ranges [10, 20) is added. Then, another one with [0, 10) is added.
(I have rounded the actual timestamps to near integers for easier understanding). Almost at the very end of the process the buffered ranges are like this: [ 0, 9) [10, 20) At this point, SourceBuffer::sourceBufferPrivateDidReceiveSample() is called with the last audio frame, that has PTS=9 and DUR=1. The execution reaches this conditional block: ``` if (trackBuffer.highestPresentationTimestamp.isValid() && trackBuffer.highestPresentationTimestamp <= presentationTimestamp) { ``` trackBuffer.highestPresentationTimestamp contains the highest PTS so far within the current segment. The condition is true (9 <= 9) as expected for sequentially appended frames. Inside there is this block of code: ``` MediaTime highestBufferedTime = trackBuffer.buffered.maximumBufferedTime(); PresentationOrderSampleMap::iterator_range range; if (highestBufferedTime - trackBuffer.highestPresentationTimestamp < trackBuffer.lastFrameDuration) range = trackBuffer.samples.presentationOrder().findSamplesWithinPresentationRangeFromEnd(trackBuffer.highestPresentationTimestamp, frameEndTimestamp); else range = trackBuffer.samples.presentationOrder().findSamplesWithinPresentationRange(trackBuffer.highestPresentationTimestamp, frameEndTimestamp); if (range.first != trackBuffer.samples.presentationOrder().end()) erasedSamples.addRange(range.first, range.second); ``` The first if block there is an optimization, it decides whether to do a binary search in the entire collection of MediaSample's or do a linear search starting with the MediaSample with the highest PTS (which is faster when appends always occur at the end), but the result is the same in both cases. presentationOrder() is a std::map<MediaTime, MediaSample>. findSamplesWithinPresentationRange(beginTime, endTime) and its *FromEnd counterpart both return a pair of STL-style iterators which cover a range of MediaSample objects whose presentation timestamps sit in the range (beginTime, endTime] (beginTime is exclusive, endTime is inclusive). Then, it marks those MediaSample objects (frames) for deletion. My question is... shouldn't the range ends inclusivity be the other way around i.e. [beginTime, endTime)? As I understand it, the point of that part of the algorithm is to delete old samples that are -- even partially -- in the presentation time range of the newly appended one, but using (beginTime, endTime] fails to accomplish that in two cases: a) If there is an existing MediaSample with PTS=9 and DUR=1 it will not be removed because beginTime (=9) is exclusive. b) If there is an existing MediaSample with PTS=10 and DUR=1 it WILL be removed even though there is no overlap with the sample being appended (PTS=9 DUR=1) because endTime (=10) is inclusive. This is exactly what is making the YTTV test fail in my case. Before: [ 0, 9) [10, 20) Expected result after adding [9, 10): [0, 20) Actual result in WebKit: [ 0, 10) [11, 20) -- Alicia. _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev