Re: Stuck in Needbuf state, trying to understand (6.7)

Bryan Linton Sun, 28 Jun 2020 00:59:39 -0700

On 2020-06-27 19:29:31, Bob Beck <[email protected]> wrote:
> 
> No. 
> 
> I know *exactly* what needbuf is but to attempt to diagnose what your
> problem is we need exact details. especially:
> 
> 1) The configuration of your system including all the details of the 
> filesystems
> you have mounted, all options used, etc. 
> 
> 2) The script you are using to generate the problem (Not a paraphrasing of 
> what
> you think the script does) What filesystems it is using. 
>


Not the OP, but this problem sounds almost exactly like the bug I
reported last year.

There is a detailed list of steps I used to reproduce the bug in
the following bug report.

        https://marc.info/?l=openbsd-bugs&m=156412299418191

I was even able to bisect and identify the commit which first
caused the breakage for me.


---8<---

CVSROOT:        /cvs
Module name:    src
Changes by:     [email protected]    2019/05/08 06:40:57

Modified files:
        sys/kern       : vfs_bio.c vfs_biomem.c

Log message:
Modify the buffer cache to always flip recovered DMA buffers high.

This also modifies the backoff logic to only back off what is requested
and not a "mimimum" amount. Tested by me, benno@, tedu@ anda ports build
by naddy@.

ok tedu@

---8<---

However, I have since migrated away from using vnd(4)s since I was
able to find other solutions that worked for my use cases.  So I 
may not be able to provide much additional information other than
what is contained in the above bug report.

-- 
Bryan

> 
> 
> On Sat, Jun 27, 2020 at 08:09:18PM -0400, sven falempin wrote:
> > On Fri, Jun 26, 2020 at 7:35 PM sven falempin <[email protected]>
> > wrote:
> > 
> > >
> > >
> > > On Fri, Jun 26, 2020 at 5:22 PM Stuart Henderson <[email protected]>
> > > wrote:
> > >
> > >> On 2020/06/26 15:30, sven falempin wrote:
> > >> > behavior confirmed on current.
> > >> >
> > >> > Once the process stalls,  ( could be anything writing to the vnconfig
> > >> disk,
> > >> > cp , umount )
> > >> > a few other calls like df , or ps, etc may hang, never the same
> > >> > sp or mp kernel, reproduced on today's snapshots.
> > >>
> > >> vnconfig is used as part of "make release", many builds are done every
> > >> week using this so it's not a general problem with vnconfig.
> > >>
> > >> Can you show some commands or a script to trigger the behaviour?
> > >>
> > >
> > > the perl script use the system to call :
> > >
> > > vnconfig.
> > > mount.
> > > umount. <- saw hanged
> > > cp.<- saw hanged
> > > tar.<- saw hanged
> > > svn up.<- saw hanged
> > > and dd.
> > > newfs.
> > >
> > > really nothing fancy, only stuff writing to disk got stuck.
> > >
> > > At one point it does a chroot but it never hangs near that , most of the
> > > time it hangs before.
> > >
> > > The script has been used like 1000 times on 6.0 and maybe twice more on
> > > 6.4.
> > >
> > > I have absolutely no idea what the 'needbuf' of top is .
> > >
> > > the script hangs at random position , always writing into vnconfig.
> > >
> > > I have no idea how to reproduce outside the perl script , so maybe it is
> > > related
> > > to some devious perl stdin/stdout buffer .
> > >
> > > Nevertheless there's like a 5% chance that's the script will work( slowly 
> > > )
> > >
> > > Most of the system call are inside a routine to log
> > >
> > > sub debug_system {
> > >   $logger->debug('running: '.join(' ', @_));
> > >   return system(@_);
> > > }
> > >
> > > so i can easily put things inside to try to understand the issue.
> > >
> > > It is really a strange behavior, and the device must be shut down
> > > electrically.
> > > Something really odd, i run syslogd on a buffer, and syslogc buffer is
> > > stuck too
> > > when the device stuck (but it supposed to be mostly already allocated
> > > memory ).
> > >
> > > It's really like the vm does not want to give anymore bucket (<- i
> > > don't know what i m talking about here,
> > > but i looks like that anything that doesn't malloc is ok , computer reply
> > > to ping , can do a few things for a while , and then complete
> > > hang )
> > >
> > > I ran the 6.7 release on a VM somewhere and another device with many perl
> > > script and they work.
> > >
> > > Only this fails 95% of the time and is VERY VERY slow when ok.
> > > compared to what i saw in /usr/src the vnconfig is big ,  ( forgot to copy
> > > df -h  ),
> > > like 2GB
> > >
> > 
> > 
> > i put ktrace in front of the perl system call
> > 
> > An di was able to recover a 800MB trace
> > 
> > $ kdump -f ./trace.out | tail -20
> > kdump: realloc: Cannot allocate memory
> >  25955         UNKNOWN(1634890859)
> >  72466 ?????????     CALL  syscall()
> > 
> > 
> > could that be of some use ?
> > 
> > 
> > -- 
> > --
> > ---------------------------------------------------------------------------------------------------------------------
> > Knowing is not enough; we must apply. Willing is not enough; we must do
> 
>

Re: Stuck in Needbuf state, trying to understand (6.7)

Reply via email to