We have some problems to get AV in sync with our hardware decoders with some kind MP4 files.
We have implemented audio- and video sinks based on a hardware decoder. This decoder(s) have a (common) clock (STC) . When I feed audio and video buffers with timestamps (PTS), the hardware will output the decoded frames according to this STC.
To build the PTS for the hardware from the buffer PTS we
-In SEGMENT event store the segment in the element and reset the hardware STC to 0 (both for audio and video)
… and send the buffer with this new PTS to hardware
Is this correct?
As written above, this works for most streams, but we found some, where AV is not in sync. (compared to the playback of this streams with VLC or Windows Mediaplayer). The special of this streams seems to be the elst-box in the MP4 file.
Mostly I get two segment events before the first video buffer, for example
time segment start=0:00:00.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x00, time=0:00:00.000000000, base=0:00:00.000000000, position 0:00:00.000000000, duration 99:99:99.999999999
time segment start=0:00:00.083333333, offset=0:00:00.000000000, stop= 0:05:00.083333333, rate=1.000000, applied_rate=1.000000, flags=0x00, time=0:00:00.000000000, base=0:00:00.000000000, position 0:00:00.083333333, duration 99:99:99.999999999