I am streaming audio from one computer (the source of audio) to another (the playback computer) over a LAN using RTP/UDP streaming. Both of these computers have their clocks synchronized with NTP to a local stratum 1 (GPS based) time server that I built and that resides on the LAN. NTP jitter is 30-50 microseconds. As a result of the tight synchrony between machines I can set my rtpjitterbuffer mode to mode 4 (“synced – assume synchronized sender and receiver clocks”). On the receiving system, I set the audio sink of the pipeline to “provide-clock=false” to force the pipeline to use the system clock. On the sending computer, I also set “provide-clock=false” on the source element. In this way, both pipelines use GstSystemClock, getting their clock from the local NTP disciplined clock. My pipelines are built using gst-launch-1.0 and both systems are running gstreamer 1.10.4.
On the clients, when the sinks are set to “provide-clock=false” they will be synced to the pipeline clock using a method that the user can specify with the “slave-method” property. I have experimented with values of “resample”, “skew”, and “none”. I am getting occasional audible glitches with resample and skew. With skew, there is a small “tick” or “pop” sound that occurs relatively frequently, perhaps every 10 seconds. When audio is muted (sending zeros) the noise is gone, so I assume this is the playback pointer being changed when samples are non-zero in a way that causes these audible artifacts. With the slave-method set to resample, the audio is fine for longer but then there will be several seconds of somewhat garbled and glitchy audio, before normal playback resumes. With “slave-method” set to none, I have the least amount of audible noises (if any). I have done listening tests for hours to try and characterize this behavior, so I am pretty confident in these assertions.
To get an idea of what else is controlling the resampling and skewing, I looked at GstAudioBaseSink. This has several related properties, e.g.:
“alignment-threshold” - Timestamp alignment threshold in nanoseconds. Default value: 40000000
“discont-wait” - A window of time in nanoseconds to wait before creating a discontinuity as a result of breaching the drift-tolerance. Default value: 1000000000
“drift-tolerance” - Controls the amount of time in microseconds that clocks are allowed to drift before resynchronisation happens. Default value: 40000
The definitions of these parameters are not exactly clear to me, e.g. alignment threshold” and “drift tolerance”. I would like to learn more about these in detail, but I do not find any more documentation about them except what I have copied and pasted above from the online docs. More about this below.
My intent for the system I am building is to have multiple endpoints on the LAN, each with their own sink. The sinks are actually part of the SAME loudspeaker system, just e.g. one sink in the left and one in the right speaker. In this case I need playback from all sinks to be well synchronized. This is audio, and left-right timing differences of even 1 millisecond create audible effects, so I would like to keep the synchronization threshold at about 1/10th of that, or 100 microseconds. This is much more severe a restriction than, for example, multiroom playback. In that case as long as the synchronization is below about 20-40 milliseconds the system will seem to be “in sync”. But that is not the case with my setup.
To try to achieve my synchronization goals, I set:
and left the other parameters alone. This results in the glitchy audio I described above with slave-method set to resample or skew. When I relax the drift-tolerance parameter to 500 or 1000, the glitches still occur, just less frequently. I am concerned that setting the drift tolerance to eg. 1000 will not result in sufficient synchronization of multiple sinks in my system.
What does the property “alignment-threshold” actually do? It is not clear from the documentation.
I assume the glitches are happening when resampling or skewing is taking place, and otherwise there is no resampling/skewing taking place. Is that correct? Can I improve this behavior by changing some other property of GstAudioBaseSink?
It seems that resample and skew are needed to account for differences between the pipeline clock and the playback rate of the sink. It seems plausible that estimating the difference in these rates over time and then using a resampling method to account for the LONG-TERM rate differences would be a superior approach. I assume that what I am experiencing is the effect of corrections that are “too drastic” that happen only now and then, resulting in the glitchy audio I am hearing. Is there any way to implement some kind of long-term averaged resampling, either under gst-launch or via code (e.g. if my application was coded in C++) based on sink buffer depletion rate?
I recently viewed a presentation from the 2015 Gstreamer conference in Dublin by Sebastian Dröge (Synchronised multi-room media playback and distributed live media processing and mixing). Sebastian mentioned some new NTP-based pipeline clock slaving methods based around netclock that are or will be being programmed into gstreamer as can be used by elements like RTP pay/depay. Since I am using gst-launch these are probably unavailable to me, and anyway may not be available yet under version 1.10. I would very much like to learn more about these, especially if they can be applied to my problem. I would also appreciate any and all feedback on how to achieve my goal using gstreamer (if it is possible) to have inter-client synchrony of 100 microseconds or better. Is that possible?
gstreamer-devel mailing list
|Free forum by Nabble||Edit this page|