Dear Kingwei,

We finally managed to look at your mheap patches, sorry for delay.

Still we are not 100% convinced that there is a bug(s) in the mheap code.
Please note that that mheap code is stable, not changed frequently and used for 
years.

It will really help if you can provide test vectors for each issue you observed.
It will be much easier to understand the problem and confirm the fix if we are 
able to reproduce it in controlled environment.

thanks,

Damjan

 
> From: <vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>> on behalf of Kingwel 
> Xie <kingwel....@ericsson.com <mailto:kingwel....@ericsson.com>>
> Date: Thursday, 19 April 2018 at 03:19
> To: "Damjan Marion (damarion)" <damar...@cisco.com 
> <mailto:damar...@cisco.com>>
> Cc: "vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>" <vpp-dev@lists.fd.io 
> <mailto:vpp-dev@lists.fd.io>>
> Subject: Re: [vpp-dev] mheap performance issue and fixup
>  
> Hi Damjan,
>  
> We will do it asap. Actually we are quite new to vPP and even don’t know how 
> to make bug report and code contribution or so.
>  
> Regards,
> Kingwel
>  
> From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> 
> [mailto:vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>] On Behalf Of Damjan 
> Marion
> Sent: Wednesday, April 18, 2018 11:30 PM
> To: Kingwel Xie <kingwel....@ericsson.com <mailto:kingwel....@ericsson.com>>
> Cc: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>
> Subject: Re: [vpp-dev] mheap performance issue and fixup
>  
> Dear Kingwel, 
>  
> Thank you for your email. It will be really appreciated if you can submit 
> your changes to gerrit, preferably each point in separate patch.
> That will be best place to discuss those changes...
>  
> Thanks in Advance,
>  
> -- 
> Damjan
>  
> On 16 Apr 2018, at 10:13, Kingwel Xie <kingwel....@ericsson.com 
> <mailto:kingwel....@ericsson.com>> wrote:
>  
> Hi all,
>  
> We recently worked on GTPU tunnel and our target is to create 2M tunnels. It 
> is not as easy as it looks like, and it took us quite some time to figure it 
> out. The biggest problem we found is about mheap, which as you know is the 
> low layer memory management function of vPP. We believe it makes sense to 
> share what we found and what we’ve done to improve the performance of mheap.
>  
> First of all, mheap is fast. It has well-designed small object cache and 
> multi-level free lists, to speed up the get/put. However, as discussed in the 
> mail list before, it has a performance issue when dealing with 
> align/align_offset allocation. We managed to locate the problem is brought by 
> a pointer ‘rewrite’ in gtp_tunnel_t. This rewrite is a vector and required to 
> be aligned to 64B cache line, therefore with 4 bytes align offset. We 
> realized that it is because that the free list must be very long, meaning so 
> many mheap_elts, but unfortunately it doesn’t have an element which fits to 
> all 3 prerequisites: size, align, and align offset. In this case,  each 
> allocation has to traverse all elements till it reaches the end of element. 
> As a result, you might observe each allocation is greater than 100000 
> clocks/call with ‘show memory verbose’. It indicates the allocation takes too 
> long, while it should be 200~300 clocks/call in general. Also you should have 
> noticed ‘per-attempt’ is quite high, even more than 100.
>  
> The fix is straight and simple : as discussed int his mail list before, to 
> allocate ‘rewrite’ from a pool, instead of from mheap. Frankly speaking, it 
> looks like a workaround not a real fix, so we spent some time fix the problem 
> thoroughly. The idea is to add a few more bytes to the original required 
> block size so that mheap will always lookup in a bigger free list, then most 
> likely a suitable block can be easily located. Well, now the problem becomes 
> how big is this extra size? It should be at least align+align_offset, not 
> hard to understand. But after careful analysis we think it is better to be 
> like this, see code below:
>  
> Mheap.c:545 
>   word modifier = (align > MHEAP_USER_DATA_WORD_BYTES ? align + align_offset 
> + sizeof(mheap_elt_t) : 0);
>   bin = user_data_size_to_bin_index (n_user_bytes + modifier);
>  
> The reason of extra sizeof(mheap_elt_t) is to avoid lo_free_size is too small 
> to hold a complete free element. You will understand it if you really know 
> how mheap_get_search_free_bin is working. I am not going to go through the 
> detail of it. In short, every lookup in free list will always locate a 
> suitable element, in other words, the hit rate of free list will be almost 
> 100%, and the ‘per-attempt’ will be always around 1. The test result looks 
> very promising, please see below after adding 2M gtpu tunnels and 2M routing 
> entries:
>  
> Thread 0 vpp_main
> 13689507 objects, 3048367k of 3505932k used, 243663k free, 243656k reclaimed, 
> 106951k overhead, 4194300k capacity
>   alloc. from small object cache: 47325868 hits 65271210 attempts (72.51%) 
> replacements 8266122
>   alloc. from free-list: 21879233 attempts, 21877898 hits (99.99%), 21882794 
> considered (per-attempt 1.00)
>   alloc. low splits: 13355414, high splits: 512984, combined: 281968
>   alloc. from vector-expand: 81907
>   allocs: 69285673 276.00 clocks/call
>   frees: 55596166 173.09 clocks/call
> Free list:
> bin 3:
> 20(82220170 48)
> total 1
> bin 273:
> 28340k(80569efc 60)
> total 1
> bin 276:
> 215323k(8c88df6c 44)
> total 1
> Total count in free bin: 3
>  
> You can see, as pointed out before, the hit rate is very high, >99.9%, and 
> per-attempt is ~1. Furthermore, the total elements in free list is only 3.
>  
> Apart from we discussed above, we also made some other improvements/bug fixes 
> to mheap:
>  
> Bug fix: macros MHEAP_ELT_OVERHEAD_BYTES & MHEAP_MIN_USER_DATA_BYTES are 
> wrongly defined. In fact MHEAP_ELT_OVERHEAD_BYTES should be (STRUCT_OFFSET_OF 
> (mheap_elt_t, user_data))
> mheap_bytes_overhead is wrongly calculating the total overhead – should be 
> number of elements * MHEAP_ELT_OVERHEAD_BYTES
> Do not make an element if hi_free_size is smaller than 4 times of 
> MHEAP_MIN_USER_DATA_BYTES. This is to avoid memory fragmentation
> Bug fix: register_node.c:336 is wrongly using vector memory,  should be like 
> this: clib_mem_is_heap_object (vec_header (r->name, 0))
> Bug fix: dpo_stack_from_node in dpo.c: memory leak, of parent_indices
> Some fixes and improvements of format_mheap to show more information of heap
>  
> The code including all fixes is tentatively in our private code base. It can 
> be of course shared if wanted.
>  
> Really appreciate any comments!
>  
> Regards,
> Kingwel
>  
>  
> _._,_._,_
> Links:
> You receive all messages sent to this group. 
> 
> View/Reply Online (#8980) <https://lists.fd.io/g/vpp-dev/message/8980> | 
> Reply To Sender 
> <mailto:kingwel....@ericsson.com?subject=Private:%20Re:%20Re%3A%20%5Bvpp-dev%5D%20mheap%20performance%20issue%20and%20fixup>
>  | Reply To Group 
> <mailto:vpp-dev@lists.fd.io?subject=Re:%20Re%3A%20%5Bvpp-dev%5D%20mheap%20performance%20issue%20and%20fixup>
>  | Mute This Topic <https://lists.fd.io/mt/17425390/675192> | New Topic 
> <https://lists.fd.io/g/vpp-dev/post>
> Change Your Subscription <https://lists.fd.io/g/vpp-dev/editsub/675192>
> Group Home <https://lists.fd.io/g/vpp-dev>
> Contact Group Owner <mailto:vpp-dev+ow...@lists.fd.io>
> Terms Of Service <https://lists.fd.io/static/tos>
> Unsubscribe From This Group <https://lists.fd.io/g/vpp-dev/unsub>
> _._,_._,_

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9727): https://lists.fd.io/g/vpp-dev/message/9727
Mute This Topic: https://lists.fd.io/mt/17425390/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to