Re: [vpp-dev] VPP Memory usage

2018-08-22 Thread Andrew Yourtchenko
Dear Rubina,

ok so looks like the usage within the ACL heap between the start and
the end of your tests increases by 1K, which is tiny - so indeed what
you are seeing in the RSS increase could be the allocations from
within bihash (which does its own memory management).

About decreasing RX I already wrote what I think is happening: because
as you reach the limit of sessions, the increasing number of sessions
end up as transient, and are reused for the new transient session
creation as the new traffic hits, and this is expensive (at this point
in time).

I will need to experiment a bit more to add more telemetry for
debugging this kind of issue - I will do that in a week's time or so
after I am done with other current work items and send you an update.

--a

On 8/21/18, Rubina Bianchi  wrote:
> Hi dear Andrew
>
> I once again ran previous scenario and logged previous outputs plus "vppctl
> show acl memory".
> I also send you T-rex configs, 60s and 1500s output.
>
> As you can see during this period RSS is increasing and Total-rx is
> decreasing.
> I assume I didn't get my answer. Why does my Total-rx decrease during my
> test and RSS still increases?
>
> Thanks,
> Sincerely
> 
> From: Andrew  Yourtchenko 
> Sent: Monday, August 20, 2018 10:25 PM
> To: Rubina Bianchi
> Cc: vpp-dev@lists.fd.io
> Subject: Re: [vpp-dev] VPP Memory usage
>
> Dear Rubina,
>
> On 8/20/18, Rubina Bianchi  wrote:
>> Hi dear Andrew
>>
>> What we were talked before was about "Worker Thread Deadlock".
>
> We had that discussion in march or may. :-)
>
> The one I had in mind was another thread, starting with your mail on
> January 30. I forwarded to you unicast :-)
>
>>
>> I tried to test scenario as you explained and started with 1M entry and
>> after that I doubled it at each run.
>> When I test with 4M entry size, I logged two things:
>> 1. ps aux | grep vpp
>> 2. First 5 lines of "vppctl show acl-plugin session"
>>
>> At first, I've run VPP and configured it with script that I attached to
>> previous email.
>> After that I run my logger script.
>> Finally I run Trex with this command: ./t-rex-64 --cfg
>> cfg/trex_config.yaml
>> -f cap2/sfr.yaml -m 50 -c 3 -d 1 -p
>> After tracing VPP logs I found some signs of leakage. I mean in the logs
>> of
>> VPP, RSS (6th parameter in ps aux command) is increasing continuously
>> (sometimes more and sometimes less) but on the other side, Trex Total-Rx
>> is
>> decreasing at the same time.
>> After about 3000 seconds, I stopped Trex and wait until session table
>> being
>> cleared. But no change in RSS happens.
>> Then, I run Trex again without any change and again I saw the increase of
>> RSS while the Trex Total-Rx is decreasing.
>
> Based on the counters, in this test we are continuously churning
> through the half-open sessions, because we are hitting the maximum
> session limit. Session creation is quite expensive (at least at this
> point, I did not optimize that code much yet).
>
>>
>> This is my ram status when vpp is stop:
>> root@debian-hp:~# free -m
>>  total   used   free sharedbuffers cached
>> Mem:129135   3414 125721 12 99591
>> -/+ buffers/cache:   2723 126412
>> Swap: 2518  0   2518
>>
>> I also attached my logs to this email. This logs are gathered every 20
>> seconds.
>>
>> In 40M entry size I saw this behavior too, but It happens much faster
>> than
>> 4M entry size.
>
> Yes, because you create more sessions and use more buckets, I think
> (though this is a speculation at this point, since we dont have the
> memory outputs).
>
> What i sthe maximum amount of simultaneous sessions on the T-rex and
> what is the connection per second rate ?
>
>> I also have a question about your phrase  of "Using this method you can
>> arrive to the number of maximum connections that your memory
>> configuration
>> can support".
>> Is there any formula to config init.conf in an efficient way? Because VPP
>> didn't return any error about misconfiguration.
>
> No, there is no formula, unfortunately - hence I can not print an
> error about a misconfiguration.
>
> You can use the "show acl memory" as I described in the other mail, to
> see what the memory usage in the session bihash is and what is the
> number of active elements - could you have a look at doing that ?
>
> --a
>
>>
>> Thanks,
>> Sincerely
>>
>>
>>
>>
>> __

Re: [vpp-dev] VPP Memory usage

2018-08-21 Thread Rubina Bianchi
Hi dear Andrew

I once again ran previous scenario and logged previous outputs plus "vppctl 
show acl memory".
I also send you T-rex configs, 60s and 1500s output.

As you can see during this period RSS is increasing and Total-rx is decreasing.
I assume I didn't get my answer. Why does my Total-rx decrease during my test 
and RSS still increases?

Thanks,
Sincerely

From: Andrew  Yourtchenko 
Sent: Monday, August 20, 2018 10:25 PM
To: Rubina Bianchi
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] VPP Memory usage

Dear Rubina,

On 8/20/18, Rubina Bianchi  wrote:
> Hi dear Andrew
>
> What we were talked before was about "Worker Thread Deadlock".

We had that discussion in march or may. :-)

The one I had in mind was another thread, starting with your mail on
January 30. I forwarded to you unicast :-)

>
> I tried to test scenario as you explained and started with 1M entry and
> after that I doubled it at each run.
> When I test with 4M entry size, I logged two things:
> 1. ps aux | grep vpp
> 2. First 5 lines of "vppctl show acl-plugin session"
>
> At first, I've run VPP and configured it with script that I attached to
> previous email.
> After that I run my logger script.
> Finally I run Trex with this command: ./t-rex-64 --cfg cfg/trex_config.yaml
> -f cap2/sfr.yaml -m 50 -c 3 -d 1 -p
> After tracing VPP logs I found some signs of leakage. I mean in the logs of
> VPP, RSS (6th parameter in ps aux command) is increasing continuously
> (sometimes more and sometimes less) but on the other side, Trex Total-Rx is
> decreasing at the same time.
> After about 3000 seconds, I stopped Trex and wait until session table being
> cleared. But no change in RSS happens.
> Then, I run Trex again without any change and again I saw the increase of
> RSS while the Trex Total-Rx is decreasing.

Based on the counters, in this test we are continuously churning
through the half-open sessions, because we are hitting the maximum
session limit. Session creation is quite expensive (at least at this
point, I did not optimize that code much yet).

>
> This is my ram status when vpp is stop:
> root@debian-hp:~# free -m
>  total   used   free sharedbuffers cached
> Mem:129135   3414 125721 12 99591
> -/+ buffers/cache:   2723 126412
> Swap: 2518  0   2518
>
> I also attached my logs to this email. This logs are gathered every 20
> seconds.
>
> In 40M entry size I saw this behavior too, but It happens much faster than
> 4M entry size.

Yes, because you create more sessions and use more buckets, I think
(though this is a speculation at this point, since we dont have the
memory outputs).

What i sthe maximum amount of simultaneous sessions on the T-rex and
what is the connection per second rate ?

> I also have a question about your phrase  of "Using this method you can
> arrive to the number of maximum connections that your memory configuration
> can support".
> Is there any formula to config init.conf in an efficient way? Because VPP
> didn't return any error about misconfiguration.

No, there is no formula, unfortunately - hence I can not print an
error about a misconfiguration.

You can use the "show acl memory" as I described in the other mail, to
see what the memory usage in the session bihash is and what is the
number of active elements - could you have a look at doing that ?

--a

>
> Thanks,
> Sincerely
>
>
>
>
> 
> From: Andrew  Yourtchenko 
> Sent: Sunday, August 19, 2018 8:28 AM
> To: Rubina Bianchi
> Cc: vpp-dev@lists.fd.io
> Subject: Re: [vpp-dev] VPP Memory usage
>
> Dear Rubina,
>
> The ACL plugin does all the necessary allocations at startup for all data
> structures except the connection bihash.
>
> You would need to check the current number of the connections as your test
> progresses. I believe we had a communication a while ago regarding the
> gradual growth of background memory usage within the bihash data structure
> as you churn through random addresses. Since then there were some changes
> aimed to address this. Please verify what does the current total session
> count look like in “show acl-plugin sessions” as your test progresses -
> based on what you described I think it continuously increases.
>
> If the bihash memory requirement for active connections goes above of what
> is available from the OS, then there is no feedback to the user code (acl
> plugin) other than a full crash.
>
> The  only safeguard I could come up against this situation is the maximum
> connection count, which is checked before attempting to insert an entry into
> the bihash.
>
&

Re: [vpp-dev] VPP Memory usage

2018-08-20 Thread Andrew Yourtchenko
Dear Rubina,

On 8/20/18, Rubina Bianchi  wrote:
> Hi dear Andrew
>
> What we were talked before was about "Worker Thread Deadlock".

We had that discussion in march or may. :-)

The one I had in mind was another thread, starting with your mail on
January 30. I forwarded to you unicast :-)

>
> I tried to test scenario as you explained and started with 1M entry and
> after that I doubled it at each run.
> When I test with 4M entry size, I logged two things:
> 1. ps aux | grep vpp
> 2. First 5 lines of "vppctl show acl-plugin session"
>
> At first, I've run VPP and configured it with script that I attached to
> previous email.
> After that I run my logger script.
> Finally I run Trex with this command: ./t-rex-64 --cfg cfg/trex_config.yaml
> -f cap2/sfr.yaml -m 50 -c 3 -d 1 -p
> After tracing VPP logs I found some signs of leakage. I mean in the logs of
> VPP, RSS (6th parameter in ps aux command) is increasing continuously
> (sometimes more and sometimes less) but on the other side, Trex Total-Rx is
> decreasing at the same time.
> After about 3000 seconds, I stopped Trex and wait until session table being
> cleared. But no change in RSS happens.
> Then, I run Trex again without any change and again I saw the increase of
> RSS while the Trex Total-Rx is decreasing.

Based on the counters, in this test we are continuously churning
through the half-open sessions, because we are hitting the maximum
session limit. Session creation is quite expensive (at least at this
point, I did not optimize that code much yet).

>
> This is my ram status when vpp is stop:
> root@debian-hp:~# free -m
>  total   used   free sharedbuffers cached
> Mem:129135   3414 125721 12 99591
> -/+ buffers/cache:   2723 126412
> Swap: 2518  0   2518
>
> I also attached my logs to this email. This logs are gathered every 20
> seconds.
>
> In 40M entry size I saw this behavior too, but It happens much faster than
> 4M entry size.

Yes, because you create more sessions and use more buckets, I think
(though this is a speculation at this point, since we dont have the
memory outputs).

What i sthe maximum amount of simultaneous sessions on the T-rex and
what is the connection per second rate ?

> I also have a question about your phrase  of "Using this method you can
> arrive to the number of maximum connections that your memory configuration
> can support".
> Is there any formula to config init.conf in an efficient way? Because VPP
> didn't return any error about misconfiguration.

No, there is no formula, unfortunately - hence I can not print an
error about a misconfiguration.

You can use the "show acl memory" as I described in the other mail, to
see what the memory usage in the session bihash is and what is the
number of active elements - could you have a look at doing that ?

--a

>
> Thanks,
> Sincerely
>
>
>
>
> 
> From: Andrew  Yourtchenko 
> Sent: Sunday, August 19, 2018 8:28 AM
> To: Rubina Bianchi
> Cc: vpp-dev@lists.fd.io
> Subject: Re: [vpp-dev] VPP Memory usage
>
> Dear Rubina,
>
> The ACL plugin does all the necessary allocations at startup for all data
> structures except the connection bihash.
>
> You would need to check the current number of the connections as your test
> progresses. I believe we had a communication a while ago regarding the
> gradual growth of background memory usage within the bihash data structure
> as you churn through random addresses. Since then there were some changes
> aimed to address this. Please verify what does the current total session
> count look like in “show acl-plugin sessions” as your test progresses -
> based on what you described I think it continuously increases.
>
> If the bihash memory requirement for active connections goes above of what
> is available from the OS, then there is no feedback to the user code (acl
> plugin) other than a full crash.
>
> The  only safeguard I could come up against this situation is the maximum
> connection count, which is checked before attempting to insert an entry into
> the bihash.
>
> Your current value is 40 million which is quite a lot, while the hash table
> heap size is 17 gigabytes. This might not be enough to hold all the 40
> million entries as the churn progresses and you need to create more
> buckets.
>
> I suggest you keep all the other parameters as they are and start with the
> value of maximum connections of 1 million and rerun the test, and monitor
> the memory usage within the ACL plugin heap (“show acl-plugin memory”) - it
> should stabilize over time at some value and there should be no crash. The
&g

Re: [vpp-dev] VPP Memory usage

2018-08-20 Thread Rubina Bianchi
Hi dear Andrew

What we were talked before was about "Worker Thread Deadlock".

I tried to test scenario as you explained and started with 1M entry and after 
that I doubled it at each run.
When I test with 4M entry size, I logged two things:
1. ps aux | grep vpp
2. First 5 lines of "vppctl show acl-plugin session"

At first, I've run VPP and configured it with script that I attached to 
previous email.
After that I run my logger script.
Finally I run Trex with this command: ./t-rex-64 --cfg cfg/trex_config.yaml  -f 
cap2/sfr.yaml -m 50 -c 3 -d 1 -p
After tracing VPP logs I found some signs of leakage. I mean in the logs of 
VPP, RSS (6th parameter in ps aux command) is increasing continuously 
(sometimes more and sometimes less) but on the other side, Trex Total-Rx is 
decreasing at the same time.
After about 3000 seconds, I stopped Trex and wait until session table being 
cleared. But no change in RSS happens.
Then, I run Trex again without any change and again I saw the increase of RSS 
while the Trex Total-Rx is decreasing.

This is my ram status when vpp is stop:
root@debian-hp:~# free -m
 total   used   free sharedbuffers cached
Mem:129135   3414 125721 12 99591
-/+ buffers/cache:   2723 126412
Swap: 2518  0   2518

I also attached my logs to this email. This logs are gathered every 20 seconds.

In 40M entry size I saw this behavior too, but It happens much faster than 4M 
entry size.
I also have a question about your phrase  of "Using this method you can arrive 
to the number of maximum connections that your memory configuration can 
support".
Is there any formula to config init.conf in an efficient way? Because VPP 
didn't return any error about misconfiguration.

Thanks,
Sincerely





From: Andrew  Yourtchenko 
Sent: Sunday, August 19, 2018 8:28 AM
To: Rubina Bianchi
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] VPP Memory usage

Dear Rubina,

The ACL plugin does all the necessary allocations at startup for all data 
structures except the connection bihash.

You would need to check the current number of the connections as your test 
progresses. I believe we had a communication a while ago regarding the gradual 
growth of background memory usage within the bihash data structure as you churn 
through random addresses. Since then there were some changes aimed to address 
this. Please verify what does the current total session count look like in 
“show acl-plugin sessions” as your test progresses - based on what you 
described I think it continuously increases.

If the bihash memory requirement for active connections goes above of what is 
available from the OS, then there is no feedback to the user code (acl plugin) 
other than a full crash.

The  only safeguard I could come up against this situation is the maximum 
connection count, which is checked before attempting to insert an entry into 
the bihash.

Your current value is 40 million which is quite a lot, while the hash table 
heap size is 17 gigabytes. This might not be enough to hold all the 40 million 
entries as the churn progresses and you need to create more buckets.

I suggest you keep all the other parameters as they are and start with the 
value of maximum connections of 1 million and rerun the test, and monitor the 
memory usage within the ACL plugin heap (“show acl-plugin memory”) - it should 
stabilize over time at some value and there should be no crash. The exact usage 
will depend on the distribution of session entries over bucket (note that in 
the worst case you may have one entry per bucket which may give a lot of 
overhead). Note that value.

If you stop the traffic, as the session count goes down to zero, the memory 
should get released.

Then double the max conn count and recheck the behavior same as above - the 
usage probably would be about double of the previous one.

Using this method you can arrive to the number of maximum connections that your 
memory configuration can support, and get a gauge of how much memory you would 
need for the target amount of connections.

If in the initial iteration test you observe the memory usage never stabilizing 
or if you see that the memory is not being released as the connection count 
goes down to zero, then it would be a bug, which we will need to further 
troubleshoot - though from your description so far it seems more a case of 
tuning the parameters. So please apply the method above and let me know how it 
goes! Thanks!

--a

On 19 Aug 2018, at 07:26, Rubina Bianchi 
mailto:r_bian...@outlook.com>> wrote:


Hi dear VPP


I configured vpp stable/1807 and added permit+reflect acl on input and output 
of my network interfaces. I configured vpp with 9 cpu (1 main and 8 worker 
cpu). My init.conf is:


vppctl>

set acl-plugin session table max-entries 4000
set acl-plugin session table hash-table-buckets 10

Re: [vpp-dev] VPP Memory usage

2018-08-19 Thread Andrew Yourtchenko
Dear Rubina,

The ACL plugin does all the necessary allocations at startup for all data 
structures except the connection bihash.

You would need to check the current number of the connections as your test 
progresses. I believe we had a communication a while ago regarding the gradual 
growth of background memory usage within the bihash data structure as you churn 
through random addresses. Since then there were some changes aimed to address 
this. Please verify what does the current total session count look like in 
“show acl-plugin sessions” as your test progresses - based on what you 
described I think it continuously increases.

If the bihash memory requirement for active connections goes above of what is 
available from the OS, then there is no feedback to the user code (acl plugin) 
other than a full crash.

The  only safeguard I could come up against this situation is the maximum 
connection count, which is checked before attempting to insert an entry into 
the bihash.

Your current value is 40 million which is quite a lot, while the hash table 
heap size is 17 gigabytes. This might not be enough to hold all the 40 million 
entries as the churn progresses and you need to create more buckets.

I suggest you keep all the other parameters as they are and start with the 
value of maximum connections of 1 million and rerun the test, and monitor the 
memory usage within the ACL plugin heap (“show acl-plugin memory”) - it should 
stabilize over time at some value and there should be no crash. The exact usage 
will depend on the distribution of session entries over bucket (note that in 
the worst case you may have one entry per bucket which may give a lot of 
overhead). Note that value. 

If you stop the traffic, as the session count goes down to zero, the memory 
should get released.

Then double the max conn count and recheck the behavior same as above - the 
usage probably would be about double of the previous one.

Using this method you can arrive to the number of maximum connections that your 
memory configuration can support, and get a gauge of how much memory you would 
need for the target amount of connections.

If in the initial iteration test you observe the memory usage never stabilizing 
or if you see that the memory is not being released as the connection count 
goes down to zero, then it would be a bug, which we will need to further 
troubleshoot - though from your description so far it seems more a case of 
tuning the parameters. So please apply the method above and let me know how it 
goes! Thanks!

--a

> On 19 Aug 2018, at 07:26, Rubina Bianchi  wrote:
> 
> Hi dear VPP
> 
> I configured vpp stable/1807 and added permit+reflect acl on input and output 
> of my network interfaces. I configured vpp with 9 cpu (1 main and 8 worker 
> cpu). My init.conf is: 
> 
> vppctl>
> set acl-plugin session table max-entries 4000
> set acl-plugin session table hash-table-buckets 100
> set acl-plugin session table hash-table-memory 17179869184
> set acl-plugin session timeout udp idle 20
> set acl-plugin session timeout tcp idle 120
> set acl-plugin session timeout tcp transient 30
> 
> vpp_api_test>
> acl_add_replace permit
> acl_add_replace permit+reflect
> 
> acl_interface_add_del TenGigabitEthernet3/0/0 add output acl 1
> acl_interface_add_del TenGigabitEthernet3/0/1 add output acl 1
> acl_interface_add_del TenGigabitEthernet3/0/0 add input acl 1
> acl_interface_add_del TenGigabitEthernet3/0/1 add input acl 1
> 
> exec set interface l2 bridge TenGigabitEthernet3/0/0 1
> exec set interface l2 bridge TenGigabitEthernet3/0/1 1
> exec set int state TenGigabitEthernet3/0/0 up
> exec set int state TenGigabitEthernet3/0/1 up
> 
> My startup.conf is pasted in this link: https://paste.ubuntu.com/p/MhQDyqF6Xd/
> 
> I used Trex as traffic generator as following:
> ./t-rex-64 --cfg cfg/trex_config.yaml  -f cap2/sfr.yaml -m 50 -c 3 -d 3600 -p
> 
> During execution of my test, Total-rx continuously decreased and after a 
> while, it reached to 0. I checked vpp status and it got SIGKILL signal from 
> OS.
> I monitored vpp memory and it was increasing until it crashed.
> Does acl_plugin session management have any memory leak problem?
> 
> Regards,
> Rubina
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> 
> View/Reply Online (#10213): https://lists.fd.io/g/vpp-dev/message/10213
> Mute This Topic: https://lists.fd.io/mt/24729023/675608
> Group Owner: vpp-dev+ow...@lists.fd.io
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [ayour...@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10214): https://lists.fd.io/g/vpp-dev/message/10214
Mute This Topic: https://lists.fd.io/mt/24729023/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] VPP Memory usage

2018-08-18 Thread Rubina Bianchi
Hi dear VPP


I configured vpp stable/1807 and added permit+reflect acl on input and output 
of my network interfaces. I configured vpp with 9 cpu (1 main and 8 worker 
cpu). My init.conf is:


vppctl>

set acl-plugin session table max-entries 4000
set acl-plugin session table hash-table-buckets 100
set acl-plugin session table hash-table-memory 17179869184
set acl-plugin session timeout udp idle 20
set acl-plugin session timeout tcp idle 120
set acl-plugin session timeout tcp transient 30


vpp_api_test>

acl_add_replace permit
acl_add_replace permit+reflect

acl_interface_add_del TenGigabitEthernet3/0/0 add output acl 1
acl_interface_add_del TenGigabitEthernet3/0/1 add output acl 1
acl_interface_add_del TenGigabitEthernet3/0/0 add input acl 1
acl_interface_add_del TenGigabitEthernet3/0/1 add input acl 1

exec set interface l2 bridge TenGigabitEthernet3/0/0 1
exec set interface l2 bridge TenGigabitEthernet3/0/1 1
exec set int state TenGigabitEthernet3/0/0 up
exec set int state TenGigabitEthernet3/0/1 up

My startup.conf is pasted in this link: https://paste.ubuntu.com/p/MhQDyqF6Xd/


I used Trex as traffic generator as following:

./t-rex-64 --cfg cfg/trex_config.yaml  -f cap2/sfr.yaml -m 50 -c 3 -d 3600 -p


During execution of my test, Total-rx continuously decreased and after a while, 
it reached to 0. I checked vpp status and it got SIGKILL signal from OS.

I monitored vpp memory and it was increasing until it crashed.

Does acl_plugin session management have any memory leak problem?


Regards,

Rubina
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10213): https://lists.fd.io/g/vpp-dev/message/10213
Mute This Topic: https://lists.fd.io/mt/24729023/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-