Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-09-07 Thread Julien Charbon
Hi Ben, On 8/31/17 12:04 PM, Ben RUBSON wrote: >> On 28 Aug 2017, at 11:27, Julien Charbon wrote: >> >> On 8/28/17 10:25 AM, Ben RUBSON wrote: On 16 Aug 2017, at 11:02, Ben RUBSON wrote: > On 15 Aug 2017, at 23:33, Julien Charbon wrote: > > On 8/11/17 11:32 AM, Ben RUBSO

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-31 Thread Ben RUBSON
> On 28 Aug 2017, at 11:27, Julien Charbon wrote: > > On 8/28/17 10:25 AM, Ben RUBSON wrote: >>> On 16 Aug 2017, at 11:02, Ben RUBSON wrote: >>> On 15 Aug 2017, at 23:33, Julien Charbon wrote: On 8/11/17 11:32 AM, Ben RUBSON wrote: >> On 08 Aug 2017, at 13:33, Julien Charbo

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-28 Thread Julien Charbon
Hi Ben, On 8/28/17 10:25 AM, Ben RUBSON wrote: >> On 16 Aug 2017, at 11:02, Ben RUBSON wrote: >> >>> On 15 Aug 2017, at 23:33, Julien Charbon wrote: >>> >>> On 8/11/17 11:32 AM, Ben RUBSON wrote: > On 08 Aug 2017, at 13:33, Julien Charbon wrote: > > On 8/8/17 10:31 AM, Hans Petter

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-28 Thread Ben RUBSON
> On 16 Aug 2017, at 11:02, Ben RUBSON wrote: > >> On 15 Aug 2017, at 23:33, Julien Charbon wrote: >> >> On 8/11/17 11:32 AM, Ben RUBSON wrote: On 08 Aug 2017, at 13:33, Julien Charbon wrote: On 8/8/17 10:31 AM, Hans Petter Selasky wrote: > > Suggested fix attached. >>

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-16 Thread Ben RUBSON
> On 15 Aug 2017, at 23:33, Julien Charbon wrote: > > On 8/11/17 11:32 AM, Ben RUBSON wrote: >>> On 08 Aug 2017, at 13:33, Julien Charbon wrote: >>> >>> On 8/8/17 10:31 AM, Hans Petter Selasky wrote: Suggested fix attached. >>> >>> I agree we your conclusion. Just for the record, m

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-15 Thread Julien Charbon
Hi Ben, On 8/11/17 11:32 AM, Ben RUBSON wrote: >> On 08 Aug 2017, at 13:33, Julien Charbon wrote: >> >> On 8/8/17 10:31 AM, Hans Petter Selasky wrote: >>> >>> Suggested fix attached. >> >> I agree we your conclusion. Just for the record, more precisely this >> regression seems to have been int

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-11 Thread Ben RUBSON
> On 08 Aug 2017, at 13:33, Julien Charbon wrote: > > Hi, > > On 8/8/17 10:31 AM, Hans Petter Selasky wrote: >> >> >> Suggested fix attached. > > I agree we your conclusion. Just for the record, more precisely this > regression seems to have been introduced with: > (...) > Thus good catch,

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Hans Petter Selasky
On 08/08/17 13:56, Slawa Olhovchenkov wrote: On Tue, Aug 08, 2017 at 01:49:08PM +0200, Hans Petter Selasky wrote: On 08/08/17 13:33, Slawa Olhovchenkov wrote: TW_RUNLOCK(V_tw_lock); and if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) { `inp` can be invalidated, freed and this pointer may be invalid? If

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Julien Charbon
Hi, On 8/8/17 10:31 AM, Hans Petter Selasky wrote: > On 08/08/17 10:06, Ben RUBSON wrote: >>> On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: >>> >>> On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, >>> >>> print *

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Slawa Olhovchenkov
On Tue, Aug 08, 2017 at 01:49:08PM +0200, Hans Petter Selasky wrote: > On 08/08/17 13:33, Slawa Olhovchenkov wrote: > > TW_RUNLOCK(V_tw_lock); > > and > > if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) { > > > > `inp` can be invalidated, freed and this pointer may be invalid? > > If you look one line up th

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Hans Petter Selasky
On 08/08/17 13:33, Slawa Olhovchenkov wrote: TW_RUNLOCK(V_tw_lock); and if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) { `inp` can be invalidated, freed and this pointer may be invalid? If you look one line up there is a pcbref ?? --HPS ___ freebsd-stable@free

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Slawa Olhovchenkov
On Tue, Aug 08, 2017 at 10:31:33AM +0200, Hans Petter Selasky wrote: > Here is the conclusion: > > The following code is going in an infinite loop: > > > > for (;;) { > > TW_RLOCK(V_tw_lock); > > tw = TAILQ_FIRST(&V_twq_2msl); > > if (tw =

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Ben RUBSON
> On 08 Aug 2017, at 10:31, Hans Petter Selasky wrote: > > On 08/08/17 10:06, Ben RUBSON wrote: >>> On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: >>> >>> On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, >>> >>>

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Hans Petter Selasky
On 08/08/17 10:06, Ben RUBSON wrote: On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, print *twq_2msl.tqh_first->tw_inpcb (kgdb) print *twq_2msl.tqh_first->tw_inpcb $3 = { i

Re: 11.0 stuck on high network load

2016-10-14 Thread Slawa Olhovchenkov
On Fri, Oct 14, 2016 at 11:48:38AM +0200, Julien Charbon wrote: > >>> Also, using dtrace too complex in production (need complex startup > >>> under screen and capture output) and for many peoples. > >>> kdb_backtrace() have too less administrative overhead. > >> > >> I still think it is overkill

Re: 11.0 stuck on high network load

2016-10-14 Thread Julien Charbon
Hi, On 10/14/16 11:35 AM, Slawa Olhovchenkov wrote: > On Thu, Oct 13, 2016 at 06:14:29PM +0200, Julien Charbon wrote: >> On 10/13/16 5:17 PM, Slawa Olhovchenkov wrote: >>> On Thu, Oct 13, 2016 at 05:06:00PM +0200, Julien Charbon wrote: >>> >> will give you that trace in the core, and without

Re: 11.0 stuck on high network load

2016-10-14 Thread Slawa Olhovchenkov
On Thu, Oct 13, 2016 at 06:14:29PM +0200, Julien Charbon wrote: > On 10/13/16 5:17 PM, Slawa Olhovchenkov wrote: > > On Thu, Oct 13, 2016 at 05:06:00PM +0200, Julien Charbon wrote: > > > will give you that trace in the core, and without INVARIANT then it is > better to use dtrace: > >>>

Re: 11.0 stuck on high network load

2016-10-13 Thread Julien Charbon
On 10/13/16 5:17 PM, Slawa Olhovchenkov wrote: > On Thu, Oct 13, 2016 at 05:06:00PM +0200, Julien Charbon wrote: > will give you that trace in the core, and without INVARIANT then it is better to use dtrace: $ cat tcp-twstart-dropped.d fbt::tcp_twstart:entry /args[0]-

Re: 11.0 stuck on high network load

2016-10-13 Thread Slawa Olhovchenkov
On Thu, Oct 13, 2016 at 05:06:00PM +0200, Julien Charbon wrote: > >> will give you that trace in the core, and without INVARIANT then it is > >> better to use dtrace: > >> > >> $ cat tcp-twstart-dropped.d > >> fbt::tcp_twstart:entry > >> /args[0]->t_inpcb->inp_flags & 0x0400/ > >> { > >> sta

Re: 11.0 stuck on high network load

2016-10-13 Thread Julien Charbon
Hi Slawa, On 10/13/16 4:38 PM, Slawa Olhovchenkov wrote: > On Thu, Oct 13, 2016 at 01:56:21PM +0200, Julien Charbon wrote: Something like: >>> >>> Yes, thanks! >> >> Proposed changes added in the review: >> >> https://reviews.freebsd.org/D8211 >> >> tell me when you have three days witho

Re: 11.0 stuck on high network load

2016-10-13 Thread Slawa Olhovchenkov
On Thu, Oct 13, 2016 at 01:56:21PM +0200, Julien Charbon wrote: > >> Something like: > > > > Yes, thanks! > > Proposed changes added in the review: > > https://reviews.freebsd.org/D8211 > > tell me when you have three days without issue with this change. > > >> tcp_detach() { > >> > >> .

Re: 11.0 stuck on high network load

2016-10-13 Thread Julien Charbon
Hi Slawa, On 10/12/16 5:42 PM, Slawa Olhovchenkov wrote: > On Wed, Oct 12, 2016 at 05:17:35PM +0200, Julien Charbon wrote: > >> I see, thus just for the context: The TCP stack in sys/dev/cxgb* >> is a >> TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a >>

Re: 11.0 stuck on high network load

2016-10-12 Thread Navdeep Parhar
> I see, thus just for the context: The TCP stack in sys/dev/cxgb* is a > TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a > separate/side TCP stack that is used only with TCP_OFFLOAD option. > > This TOE TCP stack actually has its own set of detach()/input() > functions and seems

Re: 11.0 stuck on high network load

2016-10-12 Thread Slawa Olhovchenkov
On Wed, Oct 12, 2016 at 05:17:35PM +0200, Julien Charbon wrote: > I see, thus just for the context: The TCP stack in sys/dev/cxgb* > is a > TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a > separate/side TCP stack that is used only with TCP_OFFL

Re: 11.0 stuck on high network load

2016-10-12 Thread Julien Charbon
Hi Slawa, On 10/12/16 3:01 PM, Slawa Olhovchenkov wrote: > On Wed, Oct 12, 2016 at 02:35:11PM +0200, Julien Charbon wrote: >> On 10/12/16 2:13 PM, Slawa Olhovchenkov wrote: >>> On Wed, Oct 12, 2016 at 02:06:59PM +0200, Julien Charbon wrote: > sofree() call tcp_usr_detach() and in tcp_usr

Re: 11.0 stuck on high network load

2016-10-12 Thread Slawa Olhovchenkov
On Wed, Oct 12, 2016 at 02:35:11PM +0200, Julien Charbon wrote: > > Hi Slawa, > > On 10/12/16 2:13 PM, Slawa Olhovchenkov wrote: > > On Wed, Oct 12, 2016 at 02:06:59PM +0200, Julien Charbon wrote: > >>> sofree() call tcp_usr_detach() and in tcp_usr_detach() we have > >>> unexpected INP_

Re: 11.0 stuck on high network load

2016-10-12 Thread Julien Charbon
Hi Slawa, On 10/12/16 2:13 PM, Slawa Olhovchenkov wrote: > On Wed, Oct 12, 2016 at 02:06:59PM +0200, Julien Charbon wrote: >>> sofree() call tcp_usr_detach() and in tcp_usr_detach() we have >>> unexpected INP_TIMEWAIT. >> >> I see, thus just for the context: The TCP stack in sy

Re: 11.0 stuck on high network load

2016-10-12 Thread Slawa Olhovchenkov
On Wed, Oct 12, 2016 at 02:06:59PM +0200, Julien Charbon wrote: > > sofree() call tcp_usr_detach() and in tcp_usr_detach() we have > > unexpected INP_TIMEWAIT. > > I see, thus just for the context: The TCP stack in sys/dev/cxgb* is a > TOE (TCP Offload Engine?) TCP stack f

Re: 11.0 stuck on high network load

2016-10-12 Thread Julien Charbon
Hi Slawa, On 10/12/16 11:52 AM, Slawa Olhovchenkov wrote: > On Wed, Oct 12, 2016 at 11:42:38AM +0200, Julien Charbon wrote: >> On 10/12/16 11:29 AM, Slawa Olhovchenkov wrote: >>> On Wed, Oct 12, 2016 at 11:19:48AM +0200, Julien Charbon wrote: >>> > if INP_WLOCK is like spinlock -- this is de

Re: 11.0 stuck on high network load

2016-10-12 Thread Slawa Olhovchenkov
On Wed, Oct 12, 2016 at 11:42:38AM +0200, Julien Charbon wrote: > On 10/12/16 11:29 AM, Slawa Olhovchenkov wrote: > > On Wed, Oct 12, 2016 at 11:19:48AM +0200, Julien Charbon wrote: > > > >>> if INP_WLOCK is like spinlock -- this is dead lock. > >>> if INP_WLOCK is like mutex -- thread1 reshedule

Re: 11.0 stuck on high network load

2016-10-12 Thread Julien Charbon
Hi Slawa, On 10/12/16 10:40 AM, Slawa Olhovchenkov wrote: > On Wed, Oct 12, 2016 at 10:18:18AM +0200, Julien Charbon wrote: >> On 10/11/16 2:11 PM, Slawa Olhovchenkov wrote: >>> On Tue, Oct 11, 2016 at 09:20:17AM +0200, Julien Charbon wrote: Then threads are competing for the INP_WLOCK loc

Re: 11.0 stuck on high network load

2016-10-12 Thread Julien Charbon
On 10/12/16 11:29 AM, Slawa Olhovchenkov wrote: > On Wed, Oct 12, 2016 at 11:19:48AM +0200, Julien Charbon wrote: > >>> if INP_WLOCK is like spinlock -- this is dead lock. >>> if INP_WLOCK is like mutex -- thread1 resheduled. >> >> Thanks, I understand you question now. No an interrupt cannot by

Re: 11.0 stuck on high network load

2016-10-12 Thread Slawa Olhovchenkov
On Wed, Oct 12, 2016 at 11:19:48AM +0200, Julien Charbon wrote: > > if INP_WLOCK is like spinlock -- this is dead lock. > > if INP_WLOCK is like mutex -- thread1 resheduled. > > Thanks, I understand you question now. No an interrupt cannot bypass a > lock: Here INP_WLOCK is like mutex -- threa

Re: 11.0 stuck on high network load

2016-10-12 Thread Slawa Olhovchenkov
On Wed, Oct 12, 2016 at 10:18:18AM +0200, Julien Charbon wrote: > > Hi Slawa, > > On 10/11/16 2:11 PM, Slawa Olhovchenkov wrote: > > On Tue, Oct 11, 2016 at 09:20:17AM +0200, Julien Charbon wrote: > >> Then threads are competing for the INP_WLOCK lock. For the example, > >> let's say the thre

Re: 11.0 stuck on high network load

2016-10-12 Thread Julien Charbon
Hi Slawa, On 10/11/16 2:11 PM, Slawa Olhovchenkov wrote: > On Tue, Oct 11, 2016 at 09:20:17AM +0200, Julien Charbon wrote: >> Then threads are competing for the INP_WLOCK lock. For the example, >> let's say the thread A wants to run tcp_input()/in_pcblookup_mbuf() and >> racing for this INP_WL

Re: 11.0 stuck on high network load

2016-10-11 Thread Slawa Olhovchenkov
On Tue, Oct 11, 2016 at 09:20:17AM +0200, Julien Charbon wrote: > Then threads are competing for the INP_WLOCK lock. For the example, > let's say the thread A wants to run tcp_input()/in_pcblookup_mbuf() and > racing for this INP_WLOCK: > > https://github.com/freebsd/freebsd/blob/release/11.0.0

Re: 11.0 stuck on high network load

2016-10-11 Thread Julien Charbon
Hi Slawa, On 10/10/16 7:35 PM, Slawa Olhovchenkov wrote: > On Mon, Oct 10, 2016 at 05:44:21PM +0200, Julien Charbon wrote: can check the current other usages of goto findpcb in tcp_input(). The rational here being: - Behavior before the patch: If the inp we found was delet

Re: 11.0 stuck on high network load

2016-10-10 Thread Slawa Olhovchenkov
On Mon, Oct 10, 2016 at 05:44:21PM +0200, Julien Charbon wrote: > >> can check the current other usages of goto findpcb in tcp_input(). The > >> rational here being: > >> > >> - Behavior before the patch: If the inp we found was deleted then goto > >> findpcb. > >> - Behavior after the patch:

Re: 11.0 stuck on high network load

2016-10-10 Thread Julien Charbon
Hi, On 10/10/16 4:29 PM, Slawa Olhovchenkov wrote: > On Mon, Oct 10, 2016 at 04:03:39PM +0200, Julien Charbon wrote: >> On 10/10/16 3:32 PM, Slawa Olhovchenkov wrote: >>> On Mon, Oct 10, 2016 at 01:26:12PM +0200, Julien Charbon wrote: On 10/6/16 1:10 PM, Slawa Olhovchenkov wrote: > On T

Re: 11.0 stuck on high network load

2016-10-10 Thread Slawa Olhovchenkov
On Mon, Oct 10, 2016 at 04:03:39PM +0200, Julien Charbon wrote: > > Hi Slawa, > > On 10/10/16 3:32 PM, Slawa Olhovchenkov wrote: > > On Mon, Oct 10, 2016 at 01:26:12PM +0200, Julien Charbon wrote: > >> On 10/6/16 1:10 PM, Slawa Olhovchenkov wrote: > >>> On Thu, Oct 06, 2016 at 09:28:06AM +0200,

Re: 11.0 stuck on high network load

2016-10-10 Thread Julien Charbon
Hi Slawa, On 10/10/16 3:32 PM, Slawa Olhovchenkov wrote: > On Mon, Oct 10, 2016 at 01:26:12PM +0200, Julien Charbon wrote: >> On 10/6/16 1:10 PM, Slawa Olhovchenkov wrote: >>> On Thu, Oct 06, 2016 at 09:28:06AM +0200, Julien Charbon wrote: >>> 2. thread1: In tcp_close() the inp is marked w

Re: 11.0 stuck on high network load

2016-10-10 Thread Slawa Olhovchenkov
On Mon, Oct 10, 2016 at 01:26:12PM +0200, Julien Charbon wrote: > > Hi, > > On 10/6/16 1:10 PM, Slawa Olhovchenkov wrote: > > On Thu, Oct 06, 2016 at 09:28:06AM +0200, Julien Charbon wrote: > > > >> 2. thread1: In tcp_close() the inp is marked with INP_DROPPED flag, the > >> process continues

Re: 11.0 stuck on high network load

2016-10-10 Thread Slawa Olhovchenkov
On Mon, Oct 10, 2016 at 01:26:12PM +0200, Julien Charbon wrote: > > Hi, > > On 10/6/16 1:10 PM, Slawa Olhovchenkov wrote: > > On Thu, Oct 06, 2016 at 09:28:06AM +0200, Julien Charbon wrote: > > > >> 2. thread1: In tcp_close() the inp is marked with INP_DROPPED flag, the > >> process continues

Re: 11.0 stuck on high network load

2016-10-10 Thread Julien Charbon
Hi, On 10/6/16 1:10 PM, Slawa Olhovchenkov wrote: > On Thu, Oct 06, 2016 at 09:28:06AM +0200, Julien Charbon wrote: > >> 2. thread1: In tcp_close() the inp is marked with INP_DROPPED flag, the >> process continues and calls INP_WUNLOCK() here: >> >> https://github.com/freebsd/freebsd/blob/rele

Re: 11.0 stuck on high network load

2016-10-07 Thread Slawa Olhovchenkov
On Thu, Oct 06, 2016 at 09:28:06AM +0200, Julien Charbon wrote: > Thanks again to Slawa, for his numerous debug reports and always > questioning my explanations. His last question directly led to this > finding. He is testing a quick workaround patch to check if there is more. Thanks very matc

Re: 11.0 stuck on high network load

2016-10-06 Thread Julien Charbon
Hi, On 9/28/16 1:59 PM, Slawa Olhovchenkov wrote: > On Wed, Sep 28, 2016 at 12:06:47PM +0200, Julien Charbon wrote: >> >> I am still trying to reproduce your issue, without success so far. Thanks for Slawa effort and multiple debug report we start seeing the bottom of this issue and it seems

Re: 11.0 stuck on high network load

2016-10-06 Thread Slawa Olhovchenkov
On Thu, Oct 06, 2016 at 09:28:06AM +0200, Julien Charbon wrote: > 2. thread1: In tcp_close() the inp is marked with INP_DROPPED flag, the > process continues and calls INP_WUNLOCK() here: > > https://github.com/freebsd/freebsd/blob/releng/11.0/sys/netinet/tcp_subr.c#L1568 Look also to sys/netin

Re: 11.0 stuck on high network load

2016-10-06 Thread Julien Charbon
Hi Hiren, On 10/6/16 9:44 AM, hiren panchasara wrote: > On 10/06/16 at 09:28P, Julien Charbon wrote: >> On 9/28/16 1:59 PM, Slawa Olhovchenkov wrote: >>> On Wed, Sep 28, 2016 at 12:06:47PM +0200, Julien Charbon wrote: I am still trying to reproduce your issue, without success so far.

Re: 11.0 stuck on high network load

2016-10-06 Thread hiren panchasara
On 10/06/16 at 09:51P, Julien Charbon wrote: > > Hi Hiren, > > On 10/6/16 9:44 AM, hiren panchasara wrote: > > On 10/06/16 at 09:28P, Julien Charbon wrote: > >> On 9/28/16 1:59 PM, Slawa Olhovchenkov wrote: > >>> On Wed, Sep 28, 2016 at 12:06:47PM +0200, Julien Charbon wrote: > > I a

Re: 11.0 stuck on high network load

2016-10-06 Thread hiren panchasara
On 10/06/16 at 09:28P, Julien Charbon wrote: > > Hi, > > On 9/28/16 1:59 PM, Slawa Olhovchenkov wrote: > > On Wed, Sep 28, 2016 at 12:06:47PM +0200, Julien Charbon wrote: > >> > >> I am still trying to reproduce your issue, without success so far. > > Thanks for Slawa effort and multiple deb

Re: 11.0 stuck on high network load

2016-09-30 Thread Torfinn Ingolfsen
On Wed, 28 Sep 2016 12:06:47 +0200 Julien Charbon wrote: > > I am still trying to reproduce your issue, without success so far. All: please remember to trim your quotes. Thank you. Carry on. -- Regards, Torfinn Ingolfsen ___ freebsd-stable@freebsd.

Re: 11.0 stuck on high network load

2016-09-28 Thread Slawa Olhovchenkov
On Wed, Sep 28, 2016 at 12:06:47PM +0200, Julien Charbon wrote: > > Tracing command intr pid 12 tid 100026 td 0xf8011424b500 > > sched_switch() at 0x804c956d = sched_switch+0x6ad/frame > > 0xfe3876f0 > > mi_switch() at 0x804a8d92 = mi_switch+0xd2/frame 0xfe3877

Re: 11.0 stuck on high network load

2016-09-28 Thread Julien Charbon
Hi Slawa, On 9/26/16 7:22 PM, Slawa Olhovchenkov wrote: > On Mon, Sep 26, 2016 at 11:33:12AM +0200, Julien Charbon wrote: >> On 9/25/16 2:46 PM, Slawa Olhovchenkov wrote: >>> On Fri, Sep 23, 2016 at 11:01:43PM +0300, Slawa Olhovchenkov wrote: > On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote: >

Re: 11.0 stuck on high network load

2016-09-26 Thread Slawa Olhovchenkov
On Mon, Sep 26, 2016 at 11:33:12AM +0200, Julien Charbon wrote: > > Hi Slawa, > > On 9/25/16 2:46 PM, Slawa Olhovchenkov wrote: > > On Fri, Sep 23, 2016 at 11:01:43PM +0300, Slawa Olhovchenkov wrote: > >>> On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote: > On Wed, Sep 21, 2016 at 09:11:24AM +

Re: 11.0 stuck on high network load

2016-09-26 Thread Slawa Olhovchenkov
On Mon, Sep 26, 2016 at 01:57:03PM +0200, Julien Charbon wrote: > > Hi Slawa, > > On 9/25/16 2:46 PM, Slawa Olhovchenkov wrote: > > On Fri, Sep 23, 2016 at 11:01:43PM +0300, Slawa Olhovchenkov wrote: > >> On Wed, Sep 21, 2016 at 11:25:18PM +0200, Julien Charbon wrote: > >>> > >>> On 9/21/16 9:5

Re: 11.0 stuck on high network load

2016-09-26 Thread Slawa Olhovchenkov
On Mon, Sep 26, 2016 at 11:33:12AM +0200, Julien Charbon wrote: > >>> - tcp_input()/tcp_twstart()/tcp_tw_2msl_scan(reuse=1) > >> > >> My current hypothesis: > >> > >> nginx do write() (or may be close()?) to socket, kernel lock > >> first inp in V_twq_2msl, happen callout for pfslowtimo() on the

Re: 11.0 stuck on high network load

2016-09-26 Thread Slawa Olhovchenkov
On Mon, Sep 26, 2016 at 10:51:07AM +0200, Julien Charbon wrote: > > 1049 kqread- I 145:58.35 nginx: worker process (nginx) > > 1050 kqread- I 136:33.36 nginx: worker process (nginx) > > 1051 kqread- I 140:59.73 nginx: worker process (nginx) > > 1052 kqread- I

Re: 11.0 stuck on high network load

2016-09-26 Thread Julien Charbon
Hi Slawa, On 9/25/16 2:46 PM, Slawa Olhovchenkov wrote: > On Fri, Sep 23, 2016 at 11:01:43PM +0300, Slawa Olhovchenkov wrote: >> On Wed, Sep 21, 2016 at 11:25:18PM +0200, Julien Charbon wrote: >>> >>> On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote: On Wed, Sep 21, 2016 at 09:11:24AM +0200, Ju

Re: 11.0 stuck on high network load

2016-09-26 Thread Julien Charbon
Hi Slawa, On 9/25/16 2:46 PM, Slawa Olhovchenkov wrote: > On Fri, Sep 23, 2016 at 11:01:43PM +0300, Slawa Olhovchenkov wrote: >>> On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote: On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote: > You can also use Dtrace and lockstat (especi

Re: 11.0 stuck on high network load

2016-09-26 Thread Julien Charbon
On 9/25/16 8:58 PM, Slawa Olhovchenkov wrote: > On Fri, Sep 23, 2016 at 10:16:56PM +0300, Slawa Olhovchenkov wrote: > >> On Thu, Sep 22, 2016 at 01:20:45PM +0300, Slawa Olhovchenkov wrote: >> >>> On Thu, Sep 22, 2016 at 12:04:40PM +0200, Julien Charbon wrote: >>> >> These paths can indeed com

Re: 11.0 stuck on high network load

2016-09-26 Thread Julien Charbon
Hi Slawa, On 9/23/16 9:16 PM, Slawa Olhovchenkov wrote: > On Thu, Sep 22, 2016 at 01:20:45PM +0300, Slawa Olhovchenkov wrote: > >> On Thu, Sep 22, 2016 at 12:04:40PM +0200, Julien Charbon wrote: >> > These paths can indeed compete for the same INP lock, as both > tcp_tw_2msl_scan() cal

Re: 11.0 stuck on high network load

2016-09-25 Thread Slawa Olhovchenkov
On Fri, Sep 23, 2016 at 10:16:56PM +0300, Slawa Olhovchenkov wrote: > On Thu, Sep 22, 2016 at 01:20:45PM +0300, Slawa Olhovchenkov wrote: > > > On Thu, Sep 22, 2016 at 12:04:40PM +0200, Julien Charbon wrote: > > > > > >> These paths can indeed compete for the same INP lock, as both > > > >> tcp

Re: 11.0 stuck on high network load

2016-09-25 Thread Slawa Olhovchenkov
On Fri, Sep 23, 2016 at 11:01:43PM +0300, Slawa Olhovchenkov wrote: > On Wed, Sep 21, 2016 at 11:25:18PM +0200, Julien Charbon wrote: > > > > > Hi Slawa, > > > > On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote: > > > On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote: > > >> You can

Re: 11.0 stuck on high network load

2016-09-23 Thread Slawa Olhovchenkov
On Wed, Sep 21, 2016 at 11:25:18PM +0200, Julien Charbon wrote: > > Hi Slawa, > > On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote: > > On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote: > >> You can also use Dtrace and lockstat (especially with the lockstat -s > >> option): > >> > >>

Re: 11.0 stuck on high network load

2016-09-23 Thread Slawa Olhovchenkov
On Thu, Sep 22, 2016 at 01:20:45PM +0300, Slawa Olhovchenkov wrote: > On Thu, Sep 22, 2016 at 12:04:40PM +0200, Julien Charbon wrote: > > > >> These paths can indeed compete for the same INP lock, as both > > >> tcp_tw_2msl_scan() calls always start with the first inp found in > > >> twq_2msl li

Re: 11.0 stuck on high network load

2016-09-22 Thread Slawa Olhovchenkov
On Thu, Sep 22, 2016 at 12:04:40PM +0200, Julien Charbon wrote: > >> These paths can indeed compete for the same INP lock, as both > >> tcp_tw_2msl_scan() calls always start with the first inp found in > >> twq_2msl list. But in both cases, this first inp should be quickly used > >> and its lock

Re: 11.0 stuck on high network load

2016-09-22 Thread Julien Charbon
Hi Slawa, On 9/22/16 11:53 AM, Slawa Olhovchenkov wrote: > On Wed, Sep 21, 2016 at 11:25:18PM +0200, Julien Charbon wrote: >> On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote: >>> On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote: You can also use Dtrace and lockstat (especially w

Re: 11.0 stuck on high network load

2016-09-22 Thread Slawa Olhovchenkov
On Wed, Sep 21, 2016 at 11:25:18PM +0200, Julien Charbon wrote: > > Hi Slawa, > > On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote: > > On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote: > >> You can also use Dtrace and lockstat (especially with the lockstat -s > >> option): > >> > >>

Re: 11.0 stuck on high network load

2016-09-22 Thread Slawa Olhovchenkov
On Thu, Sep 22, 2016 at 11:28:38AM +0200, Julien Charbon wrote: > >>> What purpose to not skip locked tcptw in this loop? > >> > >> If I understand your question correctly: According to your pmcstat > >> result, tcp_tw_2msl_scan() currently struggles with a write lock > >> (__rw_wlock_hard) and

Re: 11.0 stuck on high network load

2016-09-22 Thread Julien Charbon
Hi Slawa, On 9/21/16 10:31 AM, Slawa Olhovchenkov wrote: > On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote: >> On 9/20/16 10:26 PM, Slawa Olhovchenkov wrote: >>> On Tue, Sep 20, 2016 at 10:00:25PM +0200, Julien Charbon wrote: On 9/19/16 10:43 PM, Slawa Olhovchenkov wrote: >>>

Re: 11.0 stuck on high network load

2016-09-21 Thread Julien Charbon
Hi Slawa, On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote: > On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote: >> You can also use Dtrace and lockstat (especially with the lockstat -s >> option): >> >> https://wiki.freebsd.org/DTrace/One-Liners#Kernel_Locks >> https://www.freebsd.org

Re: 11.0 stuck on high network load

2016-09-21 Thread Slawa Olhovchenkov
On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote: > > You can also use Dtrace and lockstat (especially with the lockstat -s > option): > > https://wiki.freebsd.org/DTrace/One-Liners#Kernel_Locks > https://www.freebsd.org/cgi/man.cgi?query=lockstat&manpath=FreeBSD+11.0-RELEASE > >

Re: 11.0 stuck on high network load

2016-09-21 Thread Slawa Olhovchenkov
On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote: > > Hi Slawa, > > On 9/20/16 10:26 PM, Slawa Olhovchenkov wrote: > > On Tue, Sep 20, 2016 at 10:00:25PM +0200, Julien Charbon wrote: > >> On 9/19/16 10:43 PM, Slawa Olhovchenkov wrote: > >>> On Mon, Sep 19, 2016 at 10:32:13PM +0200

Re: 11.0 stuck on high network load

2016-09-21 Thread Julien Charbon
Hi Slawa, On 9/20/16 10:26 PM, Slawa Olhovchenkov wrote: > On Tue, Sep 20, 2016 at 10:00:25PM +0200, Julien Charbon wrote: >> On 9/19/16 10:43 PM, Slawa Olhovchenkov wrote: >>> On Mon, Sep 19, 2016 at 10:32:13PM +0200, Julien Charbon wrote: > @ CPU_CLK_UNHALTED_CORE [4653445 samples] >>

Re: 11.0 stuck on high network load

2016-09-20 Thread Julien Charbon
Hi Slawa, On 9/19/16 10:43 PM, Slawa Olhovchenkov wrote: > On Mon, Sep 19, 2016 at 10:32:13PM +0200, Julien Charbon wrote: >> >>> @ CPU_CLK_UNHALTED_CORE [4653445 samples] >>> >>> 51.86% [2413083] lock_delay @ /boot/kernel.VSTREAM/kernel >>> 100.0% [2413083] __rw_wlock_hard >>> 100.0% [

Re: 11.0 stuck on high network load

2016-09-20 Thread Slawa Olhovchenkov
On Tue, Sep 20, 2016 at 10:00:25PM +0200, Julien Charbon wrote: > > Hi Slawa, > > On 9/19/16 10:43 PM, Slawa Olhovchenkov wrote: > > On Mon, Sep 19, 2016 at 10:32:13PM +0200, Julien Charbon wrote: > >> > >>> @ CPU_CLK_UNHALTED_CORE [4653445 samples] > >>> > >>> 51.86% [2413083] lock_delay @ /

Re: 11.0 stuck on high network load

2016-09-19 Thread Slawa Olhovchenkov
On Mon, Sep 19, 2016 at 10:32:13PM +0200, Julien Charbon wrote: > > > @ CPU_CLK_UNHALTED_CORE [4653445 samples] > > > > 51.86% [2413083] lock_delay @ /boot/kernel.VSTREAM/kernel > > 100.0% [2413083] __rw_wlock_hard > > 100.0% [2413083]tcp_tw_2msl_scan > >99.99% [2412958] pf

Re: 11.0 stuck on high network load

2016-09-19 Thread Julien Charbon
Hi Slawa, On 9/16/16 9:03 PM, Slawa Olhovchenkov wrote: > On Fri, Sep 16, 2016 at 11:30:53AM -0700, hiren panchasara wrote: > >> On 09/16/16 at 09:18P, Slawa Olhovchenkov wrote: >>> On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote: >>> On Thu, Sep 15, 2016 at 11:59:38AM +

Re: 11.0 stuck on high network load

2016-09-19 Thread Hans Petter Selasky
On 09/18/16 22:46, Slawa Olhovchenkov wrote: userbase compatible? can i recompile only kernel? Yes, only the kernel. --HPS ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any

Re: 11.0 stuck on high network load

2016-09-18 Thread Slawa Olhovchenkov
On Sun, Sep 18, 2016 at 10:38:58PM +0200, Hans Petter Selasky wrote: > On 09/18/16 20:10, Slawa Olhovchenkov wrote: > > On Sun, Sep 18, 2016 at 07:50:08PM +0200, Hans Petter Selasky wrote: > > > >> Hi, > >> > >> Got some tips regarding this thread. > >> > >> Some things you can try: > >> > >> 1) C

Re: 11.0 stuck on high network load

2016-09-18 Thread Hans Petter Selasky
On 09/18/16 20:10, Slawa Olhovchenkov wrote: On Sun, Sep 18, 2016 at 07:50:08PM +0200, Hans Petter Selasky wrote: Hi, Got some tips regarding this thread. Some things you can try: 1) Compile kernel from projects/hps_head instead of your 11-stable? How many difference from 11-stable? Hi,

Re: 11.0 stuck on high network load

2016-09-18 Thread Slawa Olhovchenkov
On Sun, Sep 18, 2016 at 07:50:08PM +0200, Hans Petter Selasky wrote: > Hi, > > Got some tips regarding this thread. > > Some things you can try: > > 1) Compile kernel from projects/hps_head instead of your 11-stable? How many difference from 11-stable? > 2) Set net.inet.tcp.per_cpu_timers=1

11.0 stuck on high network load

2016-09-18 Thread Hans Petter Selasky
Hi, Got some tips regarding this thread. Some things you can try: 1) Compile kernel from projects/hps_head instead of your 11-stable? 2) Set net.inet.tcp.per_cpu_timers=1 If the system just hangs, it is pretty likely that the timers are going in a loop due to typical use after free. Please

Re: 11.0 stuck on high network load

2016-09-18 Thread Slawa Olhovchenkov
On Fri, Sep 16, 2016 at 12:11:55PM -0700, hiren panchasara wrote: > + jch@ > On 09/16/16 at 10:03P, Slawa Olhovchenkov wrote: > > On Fri, Sep 16, 2016 at 11:30:53AM -0700, hiren panchasara wrote: > > > > > On 09/16/16 at 09:18P, Slawa Olhovchenkov wrote: > > > > On Thu, Sep 15, 2016 at 12:06:33P

Re: 11.0 stuck on high network load

2016-09-17 Thread Slawa Olhovchenkov
On Fri, Sep 16, 2016 at 02:48:49PM -0700, hiren panchasara wrote: > On 09/16/16 at 02:46P, hiren panchasara wrote: > > On 09/16/16 at 11:30P, Slawa Olhovchenkov wrote: > > > On Fri, Sep 16, 2016 at 12:11:55PM -0700, hiren panchasara wrote: > > > > > > > > > > > As I suspected, this looks like a

Re: 11.0 stuck on high network load

2016-09-16 Thread hiren panchasara
On 09/16/16 at 02:46P, hiren panchasara wrote: > On 09/16/16 at 11:30P, Slawa Olhovchenkov wrote: > > On Fri, Sep 16, 2016 at 12:11:55PM -0700, hiren panchasara wrote: > > > > > > > > As I suspected, this looks like a hang trying to lock V_tcbinfo. > > > > > > I'm ccing Julien here who worked on

Re: 11.0 stuck on high network load

2016-09-16 Thread hiren panchasara
On 09/16/16 at 11:30P, Slawa Olhovchenkov wrote: > On Fri, Sep 16, 2016 at 12:11:55PM -0700, hiren panchasara wrote: > > > > > As I suspected, this looks like a hang trying to lock V_tcbinfo. > > > > I'm ccing Julien here who worked on WLOCK -> RLOCK transition to improve > > performance for sho

Re: 11.0 stuck on high network load

2016-09-16 Thread Slawa Olhovchenkov
On Fri, Sep 16, 2016 at 12:11:55PM -0700, hiren panchasara wrote: > > As I suspected, this looks like a hang trying to lock V_tcbinfo. > > I'm ccing Julien here who worked on WLOCK -> RLOCK transition to improve > performance for short-lived connections. I am not too sure if thats the > problem

Re: 11.0 stuck on high network load

2016-09-16 Thread hiren panchasara
+ jch@ On 09/16/16 at 10:03P, Slawa Olhovchenkov wrote: > On Fri, Sep 16, 2016 at 11:30:53AM -0700, hiren panchasara wrote: > > > On 09/16/16 at 09:18P, Slawa Olhovchenkov wrote: > > > On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote: > > > > > > > On Thu, Sep 15, 2016 at 11:59

Re: 11.0 stuck on high network load

2016-09-16 Thread Slawa Olhovchenkov
On Fri, Sep 16, 2016 at 11:30:53AM -0700, hiren panchasara wrote: > On 09/16/16 at 09:18P, Slawa Olhovchenkov wrote: > > On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote: > > > > > On Thu, Sep 15, 2016 at 11:59:38AM +0300, Konstantin Belousov wrote: > > > > > > > On Thu, Sep 15

Re: 11.0 stuck on high network load

2016-09-16 Thread Slawa Olhovchenkov
On Fri, Sep 16, 2016 at 11:30:53AM -0700, hiren panchasara wrote: > On 09/16/16 at 09:18P, Slawa Olhovchenkov wrote: > > On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote: > > > > > On Thu, Sep 15, 2016 at 11:59:38AM +0300, Konstantin Belousov wrote: > > > > > > > On Thu, Sep 15

Re: 11.0 stuck on high network load

2016-09-16 Thread hiren panchasara
On 09/16/16 at 09:18P, Slawa Olhovchenkov wrote: > On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote: > > > On Thu, Sep 15, 2016 at 11:59:38AM +0300, Konstantin Belousov wrote: > > > > > On Thu, Sep 15, 2016 at 12:35:04AM +0300, Slawa Olhovchenkov wrote: > > > > On Sun, Sep 04, 2

Re: 11.0 stuck on high network load

2016-09-16 Thread Eugene Grosbein
17.09.2016 1:18, Slawa Olhovchenkov пишет: ~^B don't break to debuger. Make sure your kernel config has: # Solaris implements a new BREAK which is initiated by a character # sequence CR ~ ^b which is similar to a familiar pattern used on # Sun servers by the Remote Console. There are FreeBSD

Re: 11.0 stuck on high network load

2016-09-16 Thread Slawa Olhovchenkov
On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote: > On Thu, Sep 15, 2016 at 11:59:38AM +0300, Konstantin Belousov wrote: > > > On Thu, Sep 15, 2016 at 12:35:04AM +0300, Slawa Olhovchenkov wrote: > > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: > > > > > >

Re: 11.0 stuck on high network load

2016-09-15 Thread Slawa Olhovchenkov
On Thu, Sep 15, 2016 at 11:59:38AM +0300, Konstantin Belousov wrote: > On Thu, Sep 15, 2016 at 12:35:04AM +0300, Slawa Olhovchenkov wrote: > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: > > > > > On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote: > > > > I am try using 11.0 o

Re: 11.0 stuck on high network load

2016-09-15 Thread Konstantin Belousov
On Thu, Sep 15, 2016 at 12:35:04AM +0300, Slawa Olhovchenkov wrote: > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: > > > On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote: > > > I am try using 11.0 on Dual E5-2620 (no X2APIC). > > > Under high network load and may be addtional c

Re: 11.0 stuck on high network load

2016-09-14 Thread Slawa Olhovchenkov
On Thu, Sep 15, 2016 at 02:33:07AM +0300, Oleksandr V. Typlyns'kyi wrote: > Sep 5 Sep 5, 2016 at 00:57 Slawa Olhovchenkov wrote: > > > I am try using 11.0 on Dual E5-2620 (no X2APIC). > > Under high network load and may be addtional conditional system go to > > unresponsible state -- no reaction

Re: 11.0 stuck on high network load

2016-09-14 Thread Oleksandr V. Typlyns'kyi
Sep 5 Sep 5, 2016 at 00:57 Slawa Olhovchenkov wrote: > I am try using 11.0 on Dual E5-2620 (no X2APIC). > Under high network load and may be addtional conditional system go to > unresponsible state -- no reaction to network and console (USB IPMI > emulation). INVARIANTS give to high overhad. Is th

Re: 11.0 stuck on high network load

2016-09-14 Thread Slawa Olhovchenkov
On Wed, Sep 14, 2016 at 11:23:20PM +0100, Gary Palmer wrote: > On Thu, Sep 15, 2016 at 01:13:35AM +0300, Slawa Olhovchenkov wrote: > > On Wed, Sep 14, 2016 at 03:04:20PM -0700, hiren panchasara wrote: > > > > > On 09/15/16 at 12:57P, Slawa Olhovchenkov wrote: > > > > On Wed, Sep 14, 2016 at 02:43

Re: 11.0 stuck on high network load

2016-09-14 Thread Gary Palmer
On Thu, Sep 15, 2016 at 01:13:35AM +0300, Slawa Olhovchenkov wrote: > On Wed, Sep 14, 2016 at 03:04:20PM -0700, hiren panchasara wrote: > > > On 09/15/16 at 12:57P, Slawa Olhovchenkov wrote: > > > On Wed, Sep 14, 2016 at 02:43:06PM -0700, hiren panchasara wrote: > > > > > > > On 09/15/16 at 12:35

  1   2   >