Re: [vpp-dev] Gerrit review for memif DMA acceleration

2022-12-02 Thread Marvin Liu
Damjan, 
External memory VFIO mapping action can moved to master thread through rpc call.
From the usage of host stack, pre-allocate is not enough for session segments 
as mapped size may vary and these segments will dynamically allocated and freed 
when session created and destroyed.  

Regards,
Marvin

> -Original Message-
> From: Damjan Marion 
> Sent: Friday, December 2, 2022 9:22 PM
> To: Liu, Yong 
> Cc: vpp-dev@lists.fd.io
> Subject: Re: Gerrit review for memif DMA acceleration
> 
> 
> Please pre-allocate segments, don’t do it on runtime from worker thread…
> 
> —
> Damjan
> 
> > On 01.12.2022., at 10:09, Liu, Yong  wrote:
> >
> > Sure, I think it is possible as only few batches needed for typical 
> > workload.
> > For dynamically mapping extended memory, I think this function is still
> needed as new segment will be allocated from system memory when new
> stream established.  This action is happened in work thread.
> >
> >> -Original Message-
> >> From: Damjan Marion 
> >> Sent: Wednesday, November 30, 2022 9:23 PM
> >> To: Liu, Yong 
> >> Cc: vpp-dev@lists.fd.io
> >> Subject: Re: Gerrit review for memif DMA acceleration
> >>
> >>
> >> Thanks,
> >>
> >> dynamically allocating physical memory from worker thread is not
> something
> >> what we do today and i don’t think it is right way to do.
> >> Even for buffer pools we don’t do that. Can you simply pre-allocate
> >> reasonable amount of physical memory on startup instead?
> >>
> >> —
> >> Damjan
> >>
> >>
> >>> On 30.11.2022., at 10:20, Liu, Yong  wrote:
> >>>
> >>> Hi Damjan,
> >>> VFIO map function now can be called from work thread in some cases.
> Like
> >> allocate physical memory for DMA batch and then do mapping for if no
> >> attached device, or a new session segment attached to VPP process.  So I
> use
> >> event-logger for vfio logging.
> >>> These numbers are collected from my sample server, we are modifying
> >> CSTI case for DMA usage. Will see more official number later.
> >>>
> >>> 1C memif l2patch No-DMA
> >>> 64   128  256 512  1024   
> >>> 1518
> >>> 8.00Mpps  6.49Mpps  4.69Mpps  3.23Mpps   2.37Mpps   1.96Mpps
> >>> 4.09Gbps   6.65Gbps  9.62Gbps  13.24Gbps   19.43Gbps   23.86Gbps
> >>>
> >>> 1C memif l2patch DMA
> >>> 64  128   256 512 1024
> >>>  1518
> >>> 8.65Mpps  8.60Mpps  8.54Mpps  8.22Mpps   8.36Mpps   7.61Mpps
> >>> 4.43Gbps   8.81Gbps   8.54Mpps   33.67Gbps   68.51Gbps   92.39Gbps
> >>>
> >>> Regards,
> >>> Marvin
> >>>
>  -Original Message-
>  From: Damjan Marion 
>  Sent: Tuesday, November 29, 2022 10:45 PM
>  To: Liu, Yong 
>  Cc: vpp-dev@lists.fd.io
>  Subject: Re: Gerrit review for memif DMA acceleration
> 
> 
>  Hi Marvin,
> 
>  for a start can you use standard vlib logging instead of elog, as all 
>  those
>  logging stiuff are not perf critical.
> 
>  Also, can you share some perf comparison between standard CPU path
> >> and
>  DSA accelerated memif?
> 
>  Thanks,
> 
>  Damjan
> 
> 
> > On 29.11.2022., at 09:05, Liu, Yong  wrote:
> >
> > Hi Damjan and community,
> > For extend the usage of latest introduced DMA infrastructure, I
> uploaded
>  several patches for review.  Before your review, let me briefly introduce
>  these patches.
> > In review 37572, add a new vfio mapped function for extended
> memory.
>  This kind of memory may come from another process like memif
> regions
> >> or
>  dynamic allocated like hoststack shared segments.
> > In review 37573, support vfio based DSA device for the scenario that
> >> need
>  fully control the resource of DSA. Compared to assign work queues
> from
>  single idxd instance into multiple processes,  this way can guarantee
> >> resource.
> > In review 37574, support CBDMA device which only support pci device
>  model. The usage of CBDMA in hoststack and memif is depend on
> 37572.
> > In review 37572, add new datapath function in memif input and tx
> node.
>  These functions followed async model and will be chosen if option
> >> “use_dma”
>  added when creating memif interface.
> > Gerrit link:
> >
> >>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgerr
> >>
> it.fd.io%2Fr%2Fc%2Fvpp%2F%2B%2F37572data=05%7C01%7C%7C93c
> >>
> 7d85e4b704701d79e08dad2b422f7%7C84df9e7fe9f640afb435
> >> %7C1%7C0%7C638053968361772272%7CUnknown%7CTWFpbGZsb3d8eyJ
> WI
> >>
> joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
> >>
> 000%7C%7C%7Csdata=92BzqQfssl2f5%2BufcV1kLyBgGC%2BOGIf5uAG
> >> Iz27u8jg%3Dreserved=0
> >
> >>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgerr
> >>
> it.fd.io%2Fr%2Fc%2Fvpp%2F%2B%2F37573data=05%7C01%7C%7C93c
> >>
> 7d85e4b704701d79e08dad2b422f7%7C84df9e7fe9f640afb435
> >> 

Re: [vpp-dev] Gerrit review for memif DMA acceleration

2022-12-01 Thread Marvin Liu
Sure, I think it is possible as only few batches needed for typical workload. 
For dynamically mapping extended memory, I think this function is still needed 
as new segment will be allocated from system memory when new stream 
established.  This action is happened in work thread. 

> -Original Message-
> From: Damjan Marion 
> Sent: Wednesday, November 30, 2022 9:23 PM
> To: Liu, Yong 
> Cc: vpp-dev@lists.fd.io
> Subject: Re: Gerrit review for memif DMA acceleration
> 
> 
> Thanks,
> 
> dynamically allocating physical memory from worker thread is not something
> what we do today and i don’t think it is right way to do.
> Even for buffer pools we don’t do that. Can you simply pre-allocate
> reasonable amount of physical memory on startup instead?
> 
> —
> Damjan
> 
> 
> > On 30.11.2022., at 10:20, Liu, Yong  wrote:
> >
> > Hi Damjan,
> > VFIO map function now can be called from work thread in some cases.  Like
> allocate physical memory for DMA batch and then do mapping for if no
> attached device, or a new session segment attached to VPP process.  So I use
> event-logger for vfio logging.
> > These numbers are collected from my sample server, we are modifying
> CSTI case for DMA usage. Will see more official number later.
> >
> > 1C memif l2patch No-DMA
> > 64   128  256 512  1024 
> >   1518
> > 8.00Mpps  6.49Mpps  4.69Mpps  3.23Mpps   2.37Mpps   1.96Mpps
> > 4.09Gbps   6.65Gbps  9.62Gbps  13.24Gbps   19.43Gbps   23.86Gbps
> >
> > 1C memif l2patch DMA
> > 64  128   256 512 1024  
> >1518
> > 8.65Mpps  8.60Mpps  8.54Mpps  8.22Mpps   8.36Mpps   7.61Mpps
> > 4.43Gbps   8.81Gbps   8.54Mpps   33.67Gbps   68.51Gbps   92.39Gbps
> >
> > Regards,
> > Marvin
> >
> >> -Original Message-
> >> From: Damjan Marion 
> >> Sent: Tuesday, November 29, 2022 10:45 PM
> >> To: Liu, Yong 
> >> Cc: vpp-dev@lists.fd.io
> >> Subject: Re: Gerrit review for memif DMA acceleration
> >>
> >>
> >> Hi Marvin,
> >>
> >> for a start can you use standard vlib logging instead of elog, as all those
> >> logging stiuff are not perf critical.
> >>
> >> Also, can you share some perf comparison between standard CPU path
> and
> >> DSA accelerated memif?
> >>
> >> Thanks,
> >>
> >> Damjan
> >>
> >>
> >>> On 29.11.2022., at 09:05, Liu, Yong  wrote:
> >>>
> >>> Hi Damjan and community,
> >>> For extend the usage of latest introduced DMA infrastructure, I uploaded
> >> several patches for review.  Before your review, let me briefly introduce
> >> these patches.
> >>> In review 37572, add a new vfio mapped function for extended memory.
> >> This kind of memory may come from another process like memif regions
> or
> >> dynamic allocated like hoststack shared segments.
> >>> In review 37573, support vfio based DSA device for the scenario that
> need
> >> fully control the resource of DSA. Compared to assign work queues from
> >> single idxd instance into multiple processes,  this way can guarantee
> resource.
> >>> In review 37574, support CBDMA device which only support pci device
> >> model. The usage of CBDMA in hoststack and memif is depend on 37572.
> >>> In review 37572, add new datapath function in memif input and tx node.
> >> These functions followed async model and will be chosen if option
> “use_dma”
> >> added when creating memif interface.
> >>> Gerrit link:
> >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgerr
> it.fd.io%2Fr%2Fc%2Fvpp%2F%2B%2F37572data=05%7C01%7C%7C93c
> 7d85e4b704701d79e08dad2b422f7%7C84df9e7fe9f640afb435
> %7C1%7C0%7C638053968361772272%7CUnknown%7CTWFpbGZsb3d8eyJWI
> joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
> 000%7C%7C%7Csdata=92BzqQfssl2f5%2BufcV1kLyBgGC%2BOGIf5uAG
> Iz27u8jg%3Dreserved=0
> >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgerr
> it.fd.io%2Fr%2Fc%2Fvpp%2F%2B%2F37573data=05%7C01%7C%7C93c
> 7d85e4b704701d79e08dad2b422f7%7C84df9e7fe9f640afb435
> %7C1%7C0%7C638053968361772272%7CUnknown%7CTWFpbGZsb3d8eyJWI
> joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
> 000%7C%7C%7Csdata=PRz65kWJOm6APSaNph0GSrGqxCz6S6iG9LkliD
> pJf5E%3Dreserved=0
> >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgerr
> it.fd.io%2Fr%2Fc%2Fvpp%2F%2B%2F37574data=05%7C01%7C%7C93c
> 7d85e4b704701d79e08dad2b422f7%7C84df9e7fe9f640afb435
> %7C1%7C0%7C638053968361772272%7CUnknown%7CTWFpbGZsb3d8eyJWI
> joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
> 000%7C%7C%7Csdata=uVO7nnt5%2F%2BmSnUpjjol7H24L%2BJ1%2BL
> o9PaQ3esHca10w%3Dreserved=0
> >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgerr
> it.fd.io%2Fr%2Fc%2Fvpp%2F%2B%2F37731data=05%7C01%7C%7C93c
> 7d85e4b704701d79e08dad2b422f7%7C84df9e7fe9f640afb435
> %7C1%7C0%7C638053968361772272%7CUnknown%7CTWFpbGZsb3d8eyJWI
> 

Re: [vpp-dev] Gerrit review for memif DMA acceleration

2022-11-30 Thread Marvin Liu
Hi Damjan,
VFIO map function now can be called from work thread in some cases.  Like 
allocate physical memory for DMA batch and then do mapping for if no attached 
device, or a new session segment attached to VPP process.  So I use 
event-logger for vfio logging. 
These numbers are collected from my sample server, we are modifying CSTI case 
for DMA usage. Will see more official number later. 

1C memif l2patch No-DMA
64   128  256 512  1024 
  1518
8.00Mpps  6.49Mpps  4.69Mpps  3.23Mpps   2.37Mpps   1.96Mpps
4.09Gbps   6.65Gbps  9.62Gbps  13.24Gbps   19.43Gbps   23.86Gbps

1C memif l2patch DMA
64  128   256 512 1024  
   1518
8.65Mpps  8.60Mpps  8.54Mpps  8.22Mpps   8.36Mpps   7.61Mpps
4.43Gbps   8.81Gbps   8.54Mpps   33.67Gbps   68.51Gbps   92.39Gbps

Regards,
Marvin

> -Original Message-
> From: Damjan Marion 
> Sent: Tuesday, November 29, 2022 10:45 PM
> To: Liu, Yong 
> Cc: vpp-dev@lists.fd.io
> Subject: Re: Gerrit review for memif DMA acceleration
> 
> 
> Hi Marvin,
> 
> for a start can you use standard vlib logging instead of elog, as all those
> logging stiuff are not perf critical.
> 
> Also, can you share some perf comparison between standard CPU path and
> DSA accelerated memif?
> 
> Thanks,
> 
> Damjan
> 
> 
> > On 29.11.2022., at 09:05, Liu, Yong  wrote:
> >
> > Hi Damjan and community,
> > For extend the usage of latest introduced DMA infrastructure, I uploaded
> several patches for review.  Before your review, let me briefly introduce
> these patches.
> >  In review 37572, add a new vfio mapped function for extended memory.
> This kind of memory may come from another process like memif regions or
> dynamic allocated like hoststack shared segments.
> > In review 37573, support vfio based DSA device for the scenario that need
> fully control the resource of DSA. Compared to assign work queues from
> single idxd instance into multiple processes,  this way can guarantee 
> resource.
> > In review 37574, support CBDMA device which only support pci device
> model. The usage of CBDMA in hoststack and memif is depend on 37572.
> > In review 37572, add new datapath function in memif input and tx node.
> These functions followed async model and will be chosen if option “use_dma”
> added when creating memif interface.
> >  Gerrit link:
> > https://gerrit.fd.io/r/c/vpp/+/37572
> > https://gerrit.fd.io/r/c/vpp/+/37573
> > https://gerrit.fd.io/r/c/vpp/+/37574
> > https://gerrit.fd.io/r/c/vpp/+/37731
> >  Best Regards,
> > Marvin
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22258): https://lists.fd.io/g/vpp-dev/message/22258
Mute This Topic: https://lists.fd.io/mt/95330357/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] Gerrit review for memif DMA acceleration

2022-11-29 Thread Marvin Liu
Hi Damjan and community,
For extend the usage of latest introduced DMA infrastructure, I uploaded 
several patches for review.  Before your review, let me briefly introduce these 
patches.

In review 37572, add a new vfio mapped 
function for extended memory. This kind of memory may come from another process 
like memif regions or dynamic allocated like hoststack shared segments.
In review 37573, support vfio based DSA 
device for the scenario that need fully control the resource of DSA. Compared 
to assign work queues from single idxd instance into multiple processes,  this 
way can guarantee resource.
In review 37574, support CBDMA device 
which only support pci device model. The usage of CBDMA in hoststack and memif 
is depend on 37572.
In review 37572, add new datapath 
function in memif input and tx node.  These functions followed async model and 
will be chosen if option "use_dma" added when creating memif interface.

Gerrit link:
https://gerrit.fd.io/r/c/vpp/+/37572
https://gerrit.fd.io/r/c/vpp/+/37573
https://gerrit.fd.io/r/c/vpp/+/37574
https://gerrit.fd.io/r/c/vpp/+/37731

Best Regards,
Marvin

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22252): https://lists.fd.io/g/vpp-dev/message/22252
Mute This Topic: https://lists.fd.io/mt/95330357/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-