When rtspsrc streams are piped into mp4mux, the resulting mp4 files have widely varying sample durations and many negative composition time offsets. Stepping frame-by-frame through these mp4s is inconsistent when using video players like quicktime -- steps often don't advance to the next frame or skip frames. Also, in some cases the PTS values in the mp4 go backwards (momentarily decreasing instead of monotonically increasing).
The root of the issue appears to be that DTS is set to local clock time when rtspsrc receives the segment. mp4mux uses the intervals between DTS's to determine sample duration. Since the time between receiving segments can vary considerably, the mp4 sample duration varies in parallel. (This variation is exaggerated by mp4mux because it pulls up the DTS of the last received segment for each frame).
The PTS is generated from the RTP timestamp, which tracks the remote camera's clock. In our pipeline, DTS < PTS regularly. Consequently, the mp4 sample composition time offsets are regularly negative and vary widely to counteract the variation in sample durations. Also, as clocks drift the sample composition time offsets grow greater in magnitude.
My temporary solution is to set DTS=PTS in mp4mux (minor modifications to gst_qt_pad_adjust_buffer_dts() in gstqtmux.c to adjust DTS for every frame). This fix produces a much "cleaner" mp4 that steps frame-by-frame consistently. This fix does not handle B-frames, which is ok for our streams.
First question: has anyone had problems with this DTS/PTS behavior when piping RTP -> mp4?
Second question: is there a more general solution that can be applied in rtspsrc so there's more correspondence between DTS and PTS?
Third question: are there ramifications to the temporary solution of setting DTS=PTS in mp4mux that I've missed, besides ruining B-frame support? And related, is it possible to extend the solution to support B-frames by recognizing gaps in PTS progressions and shifting DTS to preserve decode order?
Re: RTP DTS/PTS result in varying mp4 frame durations
Le jeudi 16 mars 2017 à 06:10 +0000, Jim Morris a écrit :
> My temporary solution is to set DTS=PTS in mp4mux (minor
> modifications to gst_qt_pad_adjust_buffer_dts() in gstqtmux.c to
> adjust DTS for every frame). This fix produces a much "cleaner" mp4
> that steps frame-by-frame consistently. This fix does not handle B-
> frames, which is ok for our streams.
This is a known issue. Someone need to look at that and find a
solution. I think the h264 depayloader could at least detect the case
without B-Frame, and set the DTS to the "jitter-free" PTS value.
The other plausible solution is to leave DTS as none and let h264parse
fix it for us. Would need work on baseparse, which annoying tries to
create DTS if none. I have personally no idea why baseparse do
timestamp, most of the time it's doing it wrong.