Re: [vpp-dev] VPP Memory usage
Dear Rubina, ok so looks like the usage within the ACL heap between the start and the end of your tests increases by 1K, which is tiny - so indeed what you are seeing in the RSS increase could be the allocations from within bihash (which does its own memory management). About decreasing RX I already wrote what I think is happening: because as you reach the limit of sessions, the increasing number of sessions end up as transient, and are reused for the new transient session creation as the new traffic hits, and this is expensive (at this point in time). I will need to experiment a bit more to add more telemetry for debugging this kind of issue - I will do that in a week's time or so after I am done with other current work items and send you an update. --a On 8/21/18, Rubina Bianchi wrote: > Hi dear Andrew > > I once again ran previous scenario and logged previous outputs plus "vppctl > show acl memory". > I also send you T-rex configs, 60s and 1500s output. > > As you can see during this period RSS is increasing and Total-rx is > decreasing. > I assume I didn't get my answer. Why does my Total-rx decrease during my > test and RSS still increases? > > Thanks, > Sincerely > > From: Andrew Yourtchenko > Sent: Monday, August 20, 2018 10:25 PM > To: Rubina Bianchi > Cc: vpp-dev@lists.fd.io > Subject: Re: [vpp-dev] VPP Memory usage > > Dear Rubina, > > On 8/20/18, Rubina Bianchi wrote: >> Hi dear Andrew >> >> What we were talked before was about "Worker Thread Deadlock". > > We had that discussion in march or may. :-) > > The one I had in mind was another thread, starting with your mail on > January 30. I forwarded to you unicast :-) > >> >> I tried to test scenario as you explained and started with 1M entry and >> after that I doubled it at each run. >> When I test with 4M entry size, I logged two things: >> 1. ps aux | grep vpp >> 2. First 5 lines of "vppctl show acl-plugin session" >> >> At first, I've run VPP and configured it with script that I attached to >> previous email. >> After that I run my logger script. >> Finally I run Trex with this command: ./t-rex-64 --cfg >> cfg/trex_config.yaml >> -f cap2/sfr.yaml -m 50 -c 3 -d 1 -p >> After tracing VPP logs I found some signs of leakage. I mean in the logs >> of >> VPP, RSS (6th parameter in ps aux command) is increasing continuously >> (sometimes more and sometimes less) but on the other side, Trex Total-Rx >> is >> decreasing at the same time. >> After about 3000 seconds, I stopped Trex and wait until session table >> being >> cleared. But no change in RSS happens. >> Then, I run Trex again without any change and again I saw the increase of >> RSS while the Trex Total-Rx is decreasing. > > Based on the counters, in this test we are continuously churning > through the half-open sessions, because we are hitting the maximum > session limit. Session creation is quite expensive (at least at this > point, I did not optimize that code much yet). > >> >> This is my ram status when vpp is stop: >> root@debian-hp:~# free -m >> total used free sharedbuffers cached >> Mem:129135 3414 125721 12 99591 >> -/+ buffers/cache: 2723 126412 >> Swap: 2518 0 2518 >> >> I also attached my logs to this email. This logs are gathered every 20 >> seconds. >> >> In 40M entry size I saw this behavior too, but It happens much faster >> than >> 4M entry size. > > Yes, because you create more sessions and use more buckets, I think > (though this is a speculation at this point, since we dont have the > memory outputs). > > What i sthe maximum amount of simultaneous sessions on the T-rex and > what is the connection per second rate ? > >> I also have a question about your phrase of "Using this method you can >> arrive to the number of maximum connections that your memory >> configuration >> can support". >> Is there any formula to config init.conf in an efficient way? Because VPP >> didn't return any error about misconfiguration. > > No, there is no formula, unfortunately - hence I can not print an > error about a misconfiguration. > > You can use the "show acl memory" as I described in the other mail, to > see what the memory usage in the session bihash is and what is the > number of active elements - could you have a look at doing that ? > > --a > >> >> Thanks, >> Sincerely >> >> >> >> >> __
Re: [vpp-dev] VPP Memory usage
Hi dear Andrew I once again ran previous scenario and logged previous outputs plus "vppctl show acl memory". I also send you T-rex configs, 60s and 1500s output. As you can see during this period RSS is increasing and Total-rx is decreasing. I assume I didn't get my answer. Why does my Total-rx decrease during my test and RSS still increases? Thanks, Sincerely From: Andrew Yourtchenko Sent: Monday, August 20, 2018 10:25 PM To: Rubina Bianchi Cc: vpp-dev@lists.fd.io Subject: Re: [vpp-dev] VPP Memory usage Dear Rubina, On 8/20/18, Rubina Bianchi wrote: > Hi dear Andrew > > What we were talked before was about "Worker Thread Deadlock". We had that discussion in march or may. :-) The one I had in mind was another thread, starting with your mail on January 30. I forwarded to you unicast :-) > > I tried to test scenario as you explained and started with 1M entry and > after that I doubled it at each run. > When I test with 4M entry size, I logged two things: > 1. ps aux | grep vpp > 2. First 5 lines of "vppctl show acl-plugin session" > > At first, I've run VPP and configured it with script that I attached to > previous email. > After that I run my logger script. > Finally I run Trex with this command: ./t-rex-64 --cfg cfg/trex_config.yaml > -f cap2/sfr.yaml -m 50 -c 3 -d 1 -p > After tracing VPP logs I found some signs of leakage. I mean in the logs of > VPP, RSS (6th parameter in ps aux command) is increasing continuously > (sometimes more and sometimes less) but on the other side, Trex Total-Rx is > decreasing at the same time. > After about 3000 seconds, I stopped Trex and wait until session table being > cleared. But no change in RSS happens. > Then, I run Trex again without any change and again I saw the increase of > RSS while the Trex Total-Rx is decreasing. Based on the counters, in this test we are continuously churning through the half-open sessions, because we are hitting the maximum session limit. Session creation is quite expensive (at least at this point, I did not optimize that code much yet). > > This is my ram status when vpp is stop: > root@debian-hp:~# free -m > total used free sharedbuffers cached > Mem:129135 3414 125721 12 99591 > -/+ buffers/cache: 2723 126412 > Swap: 2518 0 2518 > > I also attached my logs to this email. This logs are gathered every 20 > seconds. > > In 40M entry size I saw this behavior too, but It happens much faster than > 4M entry size. Yes, because you create more sessions and use more buckets, I think (though this is a speculation at this point, since we dont have the memory outputs). What i sthe maximum amount of simultaneous sessions on the T-rex and what is the connection per second rate ? > I also have a question about your phrase of "Using this method you can > arrive to the number of maximum connections that your memory configuration > can support". > Is there any formula to config init.conf in an efficient way? Because VPP > didn't return any error about misconfiguration. No, there is no formula, unfortunately - hence I can not print an error about a misconfiguration. You can use the "show acl memory" as I described in the other mail, to see what the memory usage in the session bihash is and what is the number of active elements - could you have a look at doing that ? --a > > Thanks, > Sincerely > > > > > > From: Andrew Yourtchenko > Sent: Sunday, August 19, 2018 8:28 AM > To: Rubina Bianchi > Cc: vpp-dev@lists.fd.io > Subject: Re: [vpp-dev] VPP Memory usage > > Dear Rubina, > > The ACL plugin does all the necessary allocations at startup for all data > structures except the connection bihash. > > You would need to check the current number of the connections as your test > progresses. I believe we had a communication a while ago regarding the > gradual growth of background memory usage within the bihash data structure > as you churn through random addresses. Since then there were some changes > aimed to address this. Please verify what does the current total session > count look like in “show acl-plugin sessions” as your test progresses - > based on what you described I think it continuously increases. > > If the bihash memory requirement for active connections goes above of what > is available from the OS, then there is no feedback to the user code (acl > plugin) other than a full crash. > > The only safeguard I could come up against this situation is the maximum > connection count, which is checked before attempting to insert an entry into > the bihash. > &
Re: [vpp-dev] VPP Memory usage
Dear Rubina, On 8/20/18, Rubina Bianchi wrote: > Hi dear Andrew > > What we were talked before was about "Worker Thread Deadlock". We had that discussion in march or may. :-) The one I had in mind was another thread, starting with your mail on January 30. I forwarded to you unicast :-) > > I tried to test scenario as you explained and started with 1M entry and > after that I doubled it at each run. > When I test with 4M entry size, I logged two things: > 1. ps aux | grep vpp > 2. First 5 lines of "vppctl show acl-plugin session" > > At first, I've run VPP and configured it with script that I attached to > previous email. > After that I run my logger script. > Finally I run Trex with this command: ./t-rex-64 --cfg cfg/trex_config.yaml > -f cap2/sfr.yaml -m 50 -c 3 -d 1 -p > After tracing VPP logs I found some signs of leakage. I mean in the logs of > VPP, RSS (6th parameter in ps aux command) is increasing continuously > (sometimes more and sometimes less) but on the other side, Trex Total-Rx is > decreasing at the same time. > After about 3000 seconds, I stopped Trex and wait until session table being > cleared. But no change in RSS happens. > Then, I run Trex again without any change and again I saw the increase of > RSS while the Trex Total-Rx is decreasing. Based on the counters, in this test we are continuously churning through the half-open sessions, because we are hitting the maximum session limit. Session creation is quite expensive (at least at this point, I did not optimize that code much yet). > > This is my ram status when vpp is stop: > root@debian-hp:~# free -m > total used free sharedbuffers cached > Mem:129135 3414 125721 12 99591 > -/+ buffers/cache: 2723 126412 > Swap: 2518 0 2518 > > I also attached my logs to this email. This logs are gathered every 20 > seconds. > > In 40M entry size I saw this behavior too, but It happens much faster than > 4M entry size. Yes, because you create more sessions and use more buckets, I think (though this is a speculation at this point, since we dont have the memory outputs). What i sthe maximum amount of simultaneous sessions on the T-rex and what is the connection per second rate ? > I also have a question about your phrase of "Using this method you can > arrive to the number of maximum connections that your memory configuration > can support". > Is there any formula to config init.conf in an efficient way? Because VPP > didn't return any error about misconfiguration. No, there is no formula, unfortunately - hence I can not print an error about a misconfiguration. You can use the "show acl memory" as I described in the other mail, to see what the memory usage in the session bihash is and what is the number of active elements - could you have a look at doing that ? --a > > Thanks, > Sincerely > > > > > > From: Andrew Yourtchenko > Sent: Sunday, August 19, 2018 8:28 AM > To: Rubina Bianchi > Cc: vpp-dev@lists.fd.io > Subject: Re: [vpp-dev] VPP Memory usage > > Dear Rubina, > > The ACL plugin does all the necessary allocations at startup for all data > structures except the connection bihash. > > You would need to check the current number of the connections as your test > progresses. I believe we had a communication a while ago regarding the > gradual growth of background memory usage within the bihash data structure > as you churn through random addresses. Since then there were some changes > aimed to address this. Please verify what does the current total session > count look like in “show acl-plugin sessions” as your test progresses - > based on what you described I think it continuously increases. > > If the bihash memory requirement for active connections goes above of what > is available from the OS, then there is no feedback to the user code (acl > plugin) other than a full crash. > > The only safeguard I could come up against this situation is the maximum > connection count, which is checked before attempting to insert an entry into > the bihash. > > Your current value is 40 million which is quite a lot, while the hash table > heap size is 17 gigabytes. This might not be enough to hold all the 40 > million entries as the churn progresses and you need to create more > buckets. > > I suggest you keep all the other parameters as they are and start with the > value of maximum connections of 1 million and rerun the test, and monitor > the memory usage within the ACL plugin heap (“show acl-plugin memory”) - it > should stabilize over time at some value and there should be no crash. The &g
Re: [vpp-dev] VPP Memory usage
Hi dear Andrew What we were talked before was about "Worker Thread Deadlock". I tried to test scenario as you explained and started with 1M entry and after that I doubled it at each run. When I test with 4M entry size, I logged two things: 1. ps aux | grep vpp 2. First 5 lines of "vppctl show acl-plugin session" At first, I've run VPP and configured it with script that I attached to previous email. After that I run my logger script. Finally I run Trex with this command: ./t-rex-64 --cfg cfg/trex_config.yaml -f cap2/sfr.yaml -m 50 -c 3 -d 1 -p After tracing VPP logs I found some signs of leakage. I mean in the logs of VPP, RSS (6th parameter in ps aux command) is increasing continuously (sometimes more and sometimes less) but on the other side, Trex Total-Rx is decreasing at the same time. After about 3000 seconds, I stopped Trex and wait until session table being cleared. But no change in RSS happens. Then, I run Trex again without any change and again I saw the increase of RSS while the Trex Total-Rx is decreasing. This is my ram status when vpp is stop: root@debian-hp:~# free -m total used free sharedbuffers cached Mem:129135 3414 125721 12 99591 -/+ buffers/cache: 2723 126412 Swap: 2518 0 2518 I also attached my logs to this email. This logs are gathered every 20 seconds. In 40M entry size I saw this behavior too, but It happens much faster than 4M entry size. I also have a question about your phrase of "Using this method you can arrive to the number of maximum connections that your memory configuration can support". Is there any formula to config init.conf in an efficient way? Because VPP didn't return any error about misconfiguration. Thanks, Sincerely From: Andrew Yourtchenko Sent: Sunday, August 19, 2018 8:28 AM To: Rubina Bianchi Cc: vpp-dev@lists.fd.io Subject: Re: [vpp-dev] VPP Memory usage Dear Rubina, The ACL plugin does all the necessary allocations at startup for all data structures except the connection bihash. You would need to check the current number of the connections as your test progresses. I believe we had a communication a while ago regarding the gradual growth of background memory usage within the bihash data structure as you churn through random addresses. Since then there were some changes aimed to address this. Please verify what does the current total session count look like in “show acl-plugin sessions” as your test progresses - based on what you described I think it continuously increases. If the bihash memory requirement for active connections goes above of what is available from the OS, then there is no feedback to the user code (acl plugin) other than a full crash. The only safeguard I could come up against this situation is the maximum connection count, which is checked before attempting to insert an entry into the bihash. Your current value is 40 million which is quite a lot, while the hash table heap size is 17 gigabytes. This might not be enough to hold all the 40 million entries as the churn progresses and you need to create more buckets. I suggest you keep all the other parameters as they are and start with the value of maximum connections of 1 million and rerun the test, and monitor the memory usage within the ACL plugin heap (“show acl-plugin memory”) - it should stabilize over time at some value and there should be no crash. The exact usage will depend on the distribution of session entries over bucket (note that in the worst case you may have one entry per bucket which may give a lot of overhead). Note that value. If you stop the traffic, as the session count goes down to zero, the memory should get released. Then double the max conn count and recheck the behavior same as above - the usage probably would be about double of the previous one. Using this method you can arrive to the number of maximum connections that your memory configuration can support, and get a gauge of how much memory you would need for the target amount of connections. If in the initial iteration test you observe the memory usage never stabilizing or if you see that the memory is not being released as the connection count goes down to zero, then it would be a bug, which we will need to further troubleshoot - though from your description so far it seems more a case of tuning the parameters. So please apply the method above and let me know how it goes! Thanks! --a On 19 Aug 2018, at 07:26, Rubina Bianchi mailto:r_bian...@outlook.com>> wrote: Hi dear VPP I configured vpp stable/1807 and added permit+reflect acl on input and output of my network interfaces. I configured vpp with 9 cpu (1 main and 8 worker cpu). My init.conf is: vppctl> set acl-plugin session table max-entries 4000 set acl-plugin session table hash-table-buckets 10
Re: [vpp-dev] VPP Memory usage
Dear Rubina, The ACL plugin does all the necessary allocations at startup for all data structures except the connection bihash. You would need to check the current number of the connections as your test progresses. I believe we had a communication a while ago regarding the gradual growth of background memory usage within the bihash data structure as you churn through random addresses. Since then there were some changes aimed to address this. Please verify what does the current total session count look like in “show acl-plugin sessions” as your test progresses - based on what you described I think it continuously increases. If the bihash memory requirement for active connections goes above of what is available from the OS, then there is no feedback to the user code (acl plugin) other than a full crash. The only safeguard I could come up against this situation is the maximum connection count, which is checked before attempting to insert an entry into the bihash. Your current value is 40 million which is quite a lot, while the hash table heap size is 17 gigabytes. This might not be enough to hold all the 40 million entries as the churn progresses and you need to create more buckets. I suggest you keep all the other parameters as they are and start with the value of maximum connections of 1 million and rerun the test, and monitor the memory usage within the ACL plugin heap (“show acl-plugin memory”) - it should stabilize over time at some value and there should be no crash. The exact usage will depend on the distribution of session entries over bucket (note that in the worst case you may have one entry per bucket which may give a lot of overhead). Note that value. If you stop the traffic, as the session count goes down to zero, the memory should get released. Then double the max conn count and recheck the behavior same as above - the usage probably would be about double of the previous one. Using this method you can arrive to the number of maximum connections that your memory configuration can support, and get a gauge of how much memory you would need for the target amount of connections. If in the initial iteration test you observe the memory usage never stabilizing or if you see that the memory is not being released as the connection count goes down to zero, then it would be a bug, which we will need to further troubleshoot - though from your description so far it seems more a case of tuning the parameters. So please apply the method above and let me know how it goes! Thanks! --a > On 19 Aug 2018, at 07:26, Rubina Bianchi wrote: > > Hi dear VPP > > I configured vpp stable/1807 and added permit+reflect acl on input and output > of my network interfaces. I configured vpp with 9 cpu (1 main and 8 worker > cpu). My init.conf is: > > vppctl> > set acl-plugin session table max-entries 4000 > set acl-plugin session table hash-table-buckets 100 > set acl-plugin session table hash-table-memory 17179869184 > set acl-plugin session timeout udp idle 20 > set acl-plugin session timeout tcp idle 120 > set acl-plugin session timeout tcp transient 30 > > vpp_api_test> > acl_add_replace permit > acl_add_replace permit+reflect > > acl_interface_add_del TenGigabitEthernet3/0/0 add output acl 1 > acl_interface_add_del TenGigabitEthernet3/0/1 add output acl 1 > acl_interface_add_del TenGigabitEthernet3/0/0 add input acl 1 > acl_interface_add_del TenGigabitEthernet3/0/1 add input acl 1 > > exec set interface l2 bridge TenGigabitEthernet3/0/0 1 > exec set interface l2 bridge TenGigabitEthernet3/0/1 1 > exec set int state TenGigabitEthernet3/0/0 up > exec set int state TenGigabitEthernet3/0/1 up > > My startup.conf is pasted in this link: https://paste.ubuntu.com/p/MhQDyqF6Xd/ > > I used Trex as traffic generator as following: > ./t-rex-64 --cfg cfg/trex_config.yaml -f cap2/sfr.yaml -m 50 -c 3 -d 3600 -p > > During execution of my test, Total-rx continuously decreased and after a > while, it reached to 0. I checked vpp status and it got SIGKILL signal from > OS. > I monitored vpp memory and it was increasing until it crashed. > Does acl_plugin session management have any memory leak problem? > > Regards, > Rubina > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > > View/Reply Online (#10213): https://lists.fd.io/g/vpp-dev/message/10213 > Mute This Topic: https://lists.fd.io/mt/24729023/675608 > Group Owner: vpp-dev+ow...@lists.fd.io > Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [ayour...@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#10214): https://lists.fd.io/g/vpp-dev/message/10214 Mute This Topic: https://lists.fd.io/mt/24729023/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] VPP Memory usage
Hi dear VPP I configured vpp stable/1807 and added permit+reflect acl on input and output of my network interfaces. I configured vpp with 9 cpu (1 main and 8 worker cpu). My init.conf is: vppctl> set acl-plugin session table max-entries 4000 set acl-plugin session table hash-table-buckets 100 set acl-plugin session table hash-table-memory 17179869184 set acl-plugin session timeout udp idle 20 set acl-plugin session timeout tcp idle 120 set acl-plugin session timeout tcp transient 30 vpp_api_test> acl_add_replace permit acl_add_replace permit+reflect acl_interface_add_del TenGigabitEthernet3/0/0 add output acl 1 acl_interface_add_del TenGigabitEthernet3/0/1 add output acl 1 acl_interface_add_del TenGigabitEthernet3/0/0 add input acl 1 acl_interface_add_del TenGigabitEthernet3/0/1 add input acl 1 exec set interface l2 bridge TenGigabitEthernet3/0/0 1 exec set interface l2 bridge TenGigabitEthernet3/0/1 1 exec set int state TenGigabitEthernet3/0/0 up exec set int state TenGigabitEthernet3/0/1 up My startup.conf is pasted in this link: https://paste.ubuntu.com/p/MhQDyqF6Xd/ I used Trex as traffic generator as following: ./t-rex-64 --cfg cfg/trex_config.yaml -f cap2/sfr.yaml -m 50 -c 3 -d 3600 -p During execution of my test, Total-rx continuously decreased and after a while, it reached to 0. I checked vpp status and it got SIGKILL signal from OS. I monitored vpp memory and it was increasing until it crashed. Does acl_plugin session management have any memory leak problem? Regards, Rubina -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#10213): https://lists.fd.io/g/vpp-dev/message/10213 Mute This Topic: https://lists.fd.io/mt/24729023/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-