> On 4 Feb 2019, at 14:19, Jerin Jacob Kollanukkaran <jer...@marvell.com> wrote:
> 
> On Sun, 2019-02-03 at 21:13 +0100, Damjan Marion wrote:
>> External Email
>> 
>> 
>>> On 3 Feb 2019, at 20:13, Saxena, Nitin <nitin.sax...@cavium.com 
>>> <mailto:nitin.sax...@cavium.com>> wrote:
>>> 
>>> Hi Damjan,
>>> 
>>> See function octeontx_fpa_bufpool_alloc() called by octeontx_fpa_dequeue(). 
>>> Its a single read instruction to get the pointer of data.
>> 
>> Yeah saw that, and today vpp buffer manager can grab up to 16 buffer indices 
>> with one instructions so no big deal here....
>> 
>>> Similarly, octeontx_fpa_bufpool_free() is also a single write instruction. 
>>> 
>>>> So, If you are able to prove with numbers that current software solution 
>>>> is low-performant and that you are confident that you can do significantly 
>>>> better, I will be happy to work with you on implementing support for 
>>>> hardware buffer manager.
>>> First of all I welcome your patch as we were also trying to remove 
>>> latencies seen by memcpy_x4() of buffer template. As I said earlier 
>>> hardware buffer coprocessor is being used by other packet engines hence the 
>>> support has to be added in VPP. I am looking for suggestion for its 
>>> resolution. 
>> 
>> You can hardly get any suggestion from my side if you are ignoring my 
>> questions, which I asked in my previous email to get better understanding of 
>> what your hardware do.
>> 
>> "It is hardware so it is fast" is not real argument, we need real datapoints 
>> before investing time into this area....
> 
> 
> Adding more details of HW mempool manger attributes:
> 
> 1) Semantically HW mempool manager is same as SW mempool manger
> 2) HW mempool mangers has "alloc/dequeue" and "free/enqueue" operation as SW 
> mempool manager
> 3) HW mempool mangers can work with SW per core local cache scheme too
> 4) user metadata initialization is not done in HW. SW needs to do before 
> free() or after alloc()
> 5) Typically it has an operation to "Dont free" the packet after Tx. Which 
> can be used as back end to clone the packet(aka reference count schemes)
> 6) How does HW pool manger improves the performance:
> - MP/MC can work without locks(HW takes care internally)
> - HW Frees the buffer on Tx unlike core does in SW mempool case. So it does 
> save CPU cycles packet Tx and cost of bringing packet again
> in L1 cache.
> - On the RX side, HW alloc/dequeue packet from mempool. No SW intervention 
> required.
> 
> In terms of abstraction. DPDK mempool manger does abstract SW and HW mempool 
> though static struct rte_mempool_ops.
> 
> Limitations:
> 1) Some NPU packet processing HW can work only with HW mempool manger.(Aka it 
> can not work with SW mempool manager
> as on the RX, HW looks for mempool manager to alloc and then form the packet)
> 
> Using DPDK abstractions will enable to write agositic software which works 
> NPU and CPUs models.

VPP is not DPDK application so that doesn't work for us. DPDK is just one 
optional device driver access method
and I hear more and more people asking for VPP without DPDK.

We can implement hardware buffer manager support in VPP, but honestly I'm not 
convinced that will bring any huge value and 
justify time investment. I would like that somebody proves me wrong, but with 
real data, not with statements like "it is hardware so it is faster".

-- 
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12163): https://lists.fd.io/g/vpp-dev/message/12163
Mute This Topic: https://lists.fd.io/mt/29655016/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to