appsrc performance issue

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

appsrc performance issue

mksafavi
Hi.

when I push my buffer to appsrc it takes too much time. (about 20ms for each
buffer).
I want this process to be zero-copy.

my data after processing is stored in a physically contiguous array pointer
(data->dest_vptr).
then I wrapped this array on a gst_memory and append it to the out_buffer
which I push to appsrc.

out_buffer = gst_buffer_new();
gst_buffer_append_memory (out_buffer,gst_memory_new_wrapped
(GST_MEMORY_FLAG_READONLY,data->dest_vptr, DEST_BUF_SIZE, 0, DEST_BUF_SIZE,
NULL, NULL));
source = gst_bin_get_by_name(GST_BIN(data->sink), "testsource");
ret = gst_app_src_push_buffer(GST_APP_SRC(source), out_buffer);
return ret;

in my pipeline, I have an accelerator that takes 19ms to finish its job.
filesrc -> decoder -> encoder -> filesink : 10ms per frame
filesrc -> decoder -> appsink -> [accelerator] -> appsrc -> encoder ->
filesink : 50ms per frame ( expecting approximately 30ms)

apprsc parameters:
testsource = gst_bin_get_by_name(GST_BIN(data->sink), "testsource");
    g_object_set(testsource, "format", GST_FORMAT_TIME, "block", FALSE,
NULL);
    gst_object_unref(testsource);

Am I doing an unnecessary copy somewhere? or having bad appsrc parameter
settings?
I also checked with fake sink directly after appsrc and made sure the appsrc
is bottlenecking.

thanks



--
Sent from: http://gstreamer-devel.966125.n4.nabble.com/
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: appsrc performance issue

Nicolas Dufresne-5
Le mardi 07 janvier 2020 à 00:16 -0600, mksafavi a écrit :

> Hi.
>
> when I push my buffer to appsrc it takes too much time. (about 20ms for each
> buffer).
> I want this process to be zero-copy.
>
> my data after processing is stored in a physically contiguous array pointer
> (data->dest_vptr).
> then I wrapped this array on a gst_memory and append it to the out_buffer
> which I push to appsrc.
>
> out_buffer = gst_buffer_new();
> gst_buffer_append_memory (out_buffer,gst_memory_new_wrapped
> (GST_MEMORY_FLAG_READONLY,data->dest_vptr, DEST_BUF_SIZE, 0, DEST_BUF_SIZE,
> NULL, NULL));
> source = gst_bin_get_by_name(GST_BIN(data->sink), "testsource");
> ret = gst_app_src_push_buffer(GST_APP_SRC(source), out_buffer);
> return ret;
>
> in my pipeline, I have an accelerator that takes 19ms to finish its job.
> filesrc -> decoder -> encoder -> filesink : 10ms per frame
> filesrc -> decoder -> appsink -> [accelerator] -> appsrc -> encoder ->
> filesink : 50ms per frame ( expecting approximately 30ms)
>
> apprsc parameters:
> testsource = gst_bin_get_by_name(GST_BIN(data->sink), "testsource");
>     g_object_set(testsource, "format", GST_FORMAT_TIME, "block", FALSE,
> NULL);
>     gst_object_unref(testsource);
>
> Am I doing an unnecessary copy somewhere? or having bad appsrc parameter
> settings?
> I also checked with fake sink directly after appsrc and made sure the appsrc
> is bottlenecking.

Did you located a copy? do you notice higher CPU if you do appsink ->
appsrc (skipping the accel) ? (GST_DEBUG="*PERF*:5") A copy can happen
in such case if the decoder requires GstVideoMeta. Here's an example
code using appsink that enables GstVideoMeta (to allow zero-copy):

https://gitlab.freedesktop.org/mesa/kmscube/blob/master/gst-decoder.c#L242

This is all dependent on the decoder being used, which you have
abstracted here.

>
> thanks
>
>
>
> --
> Sent from: http://gstreamer-devel.966125.n4.nabble.com/
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: appsrc performance issue

Nicolas Dufresne-5
Le mardi 07 janvier 2020 à 14:00 -0500, Nicolas Dufresne a écrit :

> Le mardi 07 janvier 2020 à 00:16 -0600, mksafavi a écrit :
> > Hi.
> >
> > when I push my buffer to appsrc it takes too much time. (about 20ms
> > for each
> > buffer).
> > I want this process to be zero-copy.
> >
> > my data after processing is stored in a physically contiguous array
> > pointer
> > (data->dest_vptr).
> > then I wrapped this array on a gst_memory and append it to the
> > out_buffer
> > which I push to appsrc.
> >
> > out_buffer = gst_buffer_new();
> > gst_buffer_append_memory (out_buffer,gst_memory_new_wrapped
> > (GST_MEMORY_FLAG_READONLY,data->dest_vptr, DEST_BUF_SIZE, 0,
> > DEST_BUF_SIZE,
> > NULL, NULL));
> > source = gst_bin_get_by_name(GST_BIN(data->sink), "testsource");
> > ret = gst_app_src_push_buffer(GST_APP_SRC(source), out_buffer);
> > return ret;
> >
> > in my pipeline, I have an accelerator that takes 19ms to finish its
> > job.
> > filesrc -> decoder -> encoder -> filesink : 10ms per frame
> > filesrc -> decoder -> appsink -> [accelerator] -> appsrc -> encoder
> > ->
> > filesink : 50ms per frame ( expecting approximately 30ms)
> >
> > apprsc parameters:
> > testsource = gst_bin_get_by_name(GST_BIN(data->sink),
> > "testsource");
> >     g_object_set(testsource, "format", GST_FORMAT_TIME, "block",
> > FALSE,
> > NULL);
> >     gst_object_unref(testsource);
> >
> > Am I doing an unnecessary copy somewhere? or having bad appsrc
> > parameter
> > settings?
> > I also checked with fake sink directly after appsrc and made sure
> > the appsrc
> > is bottlenecking.
>
> Did you located a copy? do you notice higher CPU if you do appsink ->
> appsrc (skipping the accel) ? (GST_DEBUG="*PERF*:5") A copy can
> happen
> in such case if the decoder requires GstVideoMeta. Here's an example
> code using appsink that enables GstVideoMeta (to allow zero-copy):
>
> https://gitlab.freedesktop.org/mesa/kmscube/blob/master/gst-decoder.c#L242
>
> This is all dependent on the decoder being used, which you have
> abstracted here.

I forgot, you might also want to proxy the allocation query, as it is
possible that the decoder was writing directly into the encoder
allocated buffers, which may save a copy too.

>
> > thanks
> >
> >
> >
> > --
> > Sent from: http://gstreamer-devel.966125.n4.nabble.com/
> > _______________________________________________
> > gstreamer-devel mailing list
> > [hidden email]
> > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: appsrc performance issue

mksafavi
In reply to this post by Nicolas Dufresne-5
thanks.
I'll look into CPU usage and report back.
I don't know about how to proxy query allocation though.


I'm using the Xilinx VCU decoder/encoder on the Ultrascale ZCU104 board.

filesrc location=\"%s\" ! qtdemux name=demux demux.video_0 ! h264parse !
queue ! omxh264dec ! appsink caps=\"%s\" name=testsink",

" appsrc name=testsource caps=\"%s\" ! omxh264enc num-slices=1 cpb-size=4000
max-qp=45  scaling-list=0 control-rate=1 b-frames=1 gop-mode=2 qp-mode=2
quant-i-frames=28 prefetch-buffer=true periodicity-idr=60 gop-length=60
target-bitrate=1600 ! video/x-h264, profile=main ! filesink location=\"%s\"
",



--
Sent from: http://gstreamer-devel.966125.n4.nabble.com/
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: appsrc performance issue

Nicolas Dufresne-5
Le mercredi 08 janvier 2020 à 11:07 -0600, mksafavi a écrit :
> thanks.
> I'll look into CPU usage and report back.
> I don't know about how to proxy query allocation though.
>
>
> I'm using the Xilinx VCU decoder/encoder on the Ultrascale ZCU104 board.

Then you really want to proxy the allocation. The encoder requires
larger padding then what the decoder produce by default. With Xilinx
branch, it's fully handled already for what I'm aware, but for that,
you need to proxy at least the allocation query. To do so, you have to
add a probe on the sink pad of that appsink, and then run that query on
the appsrc element. I believe that should be sufficient, but I never
tried on that board.

Have you considered implementing a filter element instead ? I wrote the
other a quick minimal filter for that kind of use case, but haven't
published it yet, it does not even have a build system yet. But maybe
it can be useful to you. Note that the buffer are not writable in this
example, comment out the code in my_minimal_init() if you need writable
init (and for that you need the latest release, since you need a gst-
omx without the NO_SHARE flags set on buffers).

https://gitlab.collabora.com/nicolas/minimal

>
> filesrc location=\"%s\" ! qtdemux name=demux demux.video_0 ! h264parse !
> queue ! omxh264dec ! appsink caps=\"%s\" name=testsink",
>
> " appsrc name=testsource caps=\"%s\" ! omxh264enc num-slices=1 cpb-size=4000
> max-qp=45  scaling-list=0 control-rate=1 b-frames=1 gop-mode=2 qp-mode=2
> quant-i-frames=28 prefetch-buffer=true periodicity-idr=60 gop-length=60
> target-bitrate=1600 ! video/x-h264, profile=main ! filesink location=\"%s\"
> ",
>
>
>
> --
> Sent from: http://gstreamer-devel.966125.n4.nabble.com/
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: appsrc performance issue

mksafavi
Sorry for the late reply.

> Then you really want to proxy the allocation. The encoder requires
> larger padding then what the decoder produce by default.

I take a buffer from appsink and then wrap accelerator output into another
buffer that I push it to appsrc.
I think my decoder and encoder do not use the same buffer pool.

I think our design has another flaw. We cannot share buffers to the DMA from
a userspace driver. to fix this issue without doing a copy we're going to
setup SMMU (which works with VFIO driver) to initialize DMA with physical
address but I'm not sure how it will perform.

> With Xilinx branch, it's fully handled already for what I'm aware, but for
> that,
> you need to proxy at least the allocation query. To do so, you have to
> add a probe on the sink pad of that appsink, and then run that query on
> the appsrc element. I believe that should be sufficient, but I never
> tried on that board.

So, I should just get appsink query and then set it as appsrc's?

> Have you considered implementing a filter element instead ?
when I first started this project I wanted to implement a plugin. but the
application way felt easier.
If I fix current issues with appsrc, do they have performance differences?

> I wrote the other a quick minimal filter for that kind of use case, but
> haven't
> published it yet, it does not even have a build system yet. But maybe
> it can be useful to you. Note that the buffer are not writable in this
> example, comment out the code in my_minimal_init() if you need writable
> init (and for that you need the latest release, since you need a gst-
> omx without the NO_SHARE flags set on buffers)
thanks. I'll be glad to check it.
> https://gitlab.collabora.com/nicolas/minimal
I think the repo is private, though.



--
Sent from: http://gstreamer-devel.966125.n4.nabble.com/
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: appsrc performance issue

Nicolas Dufresne-5
Le mercredi 15 janvier 2020 à 12:39 -0600, mksafavi a écrit :

> Sorry for the late reply.
>
> > Then you really want to proxy the allocation. The encoder requires
> > larger padding then what the decoder produce by default.
>
> I take a buffer from appsink and then wrap accelerator output into another
> buffer that I push it to appsrc.
> I think my decoder and encoder do not use the same buffer pool.
>
> I think our design has another flaw. We cannot share buffers to the DMA from
> a userspace driver. to fix this issue without doing a copy we're going to
> setup SMMU (which works with VFIO driver) to initialize DMA with physical
> address but I'm not sure how it will perform.
>
> > With Xilinx branch, it's fully handled already for what I'm aware, but for
> > that,
> > you need to proxy at least the allocation query. To do so, you have to
> > add a probe on the sink pad of that appsink, and then run that query on
> > the appsrc element. I believe that should be sufficient, but I never
> > tried on that board.
>
> So, I should just get appsink query and then set it as appsrc's?
>
> > Have you considered implementing a filter element instead ?
> when I first started this project I wanted to implement a plugin. but the
> application way felt easier.
> If I fix current issues with appsrc, do they have performance differences?

The penalty of appsink/appsrc is mostly a queue and a thread. The rest
will depends on your application code.

>
> > I wrote the other a quick minimal filter for that kind of use case, but
> > haven't
> > published it yet, it does not even have a build system yet. But maybe
> > it can be useful to you. Note that the buffer are not writable in this
> > example, comment out the code in my_minimal_init() if you need writable
> > init (and for that you need the latest release, since you need a gst-
> > omx without the NO_SHARE flags set on buffers)
> thanks. I'll be glad to check it.
> > https://gitlab.collabora.com/nicolas/minimal
> I think the repo is private, though.

Fixed, forgot to click "Save changes", oops.

>
>
>
> --
> Sent from: http://gstreamer-devel.966125.n4.nabble.com/
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel