On Tue, 18 Oct 2011, Emmanuel Dreyfus wrote:
Hisashi T Fujinaka wrote:
I'm sorry I wasn't paying attention to this earlier. I've seen reboots
on my amd64 netbsd-5 server under probably heavy load (building
userspace or rebuilding all my pkgsrc packages, for example) but haven't
seen a core fi
Hisashi T Fujinaka wrote:
> I'm sorry I wasn't paying attention to this earlier. I've seen reboots
> on my amd64 netbsd-5 server under probably heavy load (building
> userspace or rebuilding all my pkgsrc packages, for example) but haven't
> seen a core file (I probably disabled that) or any othe
On Tue, Oct 18, 2011 at 10:05:24AM -0700, Hisashi T Fujinaka wrote:
> I'm sorry I wasn't paying attention to this earlier. I've seen reboots
> on my amd64 netbsd-5 server under probably heavy load (building
> userspace or rebuilding all my pkgsrc packages, for example) but haven't
> seen a core fil
On Tue, 18 Oct 2011, Emmanuel Dreyfus wrote:
Emmanuel Dreyfus wrote:
When running perfused stress test (a build of NetBSD over a glusterfs
volume), memory gets low, and the machine hangs.
I made some progress, thanks to Manuel Bouyer suggestions. There are
code paths where pagedaemon sleeps
Emmanuel Dreyfus wrote:
> When running perfused stress test (a build of NetBSD over a glusterfs
> volume), memory gets low, and the machine hangs.
I made some progress, thanks to Manuel Bouyer suggestions. There are
code paths where pagedaemon sleeps for memory. During my tests, I never
spoted
Masao Uebayashi wrote:
> > I do not see any kernel thread stuck anymore, so is that the problem?
> No, it isn't.
While we are at that, here are the change I did to avoid ioflush getting
stuck in PUFFS. That is against netbsd-5 with not-yet-pulled-up
pn->pn_sizemtx change (the patch was edited fo
On Sat, Sep 17, 2011 at 12:07 PM, Emmanuel Dreyfus wrote:
> Masao Uebayashi wrote:
>
>> My understanding is that the swap I/O code path including underlying
>> bdevs must *never* allocate memory to handle I/O, to resove such a
>> resource shortage situation. Or the proc doing swap I/O would get
Masao Uebayashi wrote:
> My understanding is that the swap I/O code path including underlying
> bdevs must *never* allocate memory to handle I/O, to resove such a
> resource shortage situation. Or the proc doing swap I/O would get
> stuck somewhere.
I do not see any kernel thread stuck anymore,
On Sat, Sep 17, 2011 at 10:28 AM, Emmanuel Dreyfus wrote:
> Masao Uebayashi wrote:
>
>> So what I can think of now is, the underlying bdev can't finish I/O
>> because it allocates memory to handle I/O requests?
>
> I already spotted two points where ioflush was stuck in PUFFS code path,
> and I a
Masao Uebayashi wrote:
> So what I can think of now is, the underlying bdev can't finish I/O
> because it allocates memory to handle I/O requests?
I already spotted two points where ioflush was stuck in PUFFS code path,
and I added code so that it can act immediatly or get an error. That
helps,
On Thu, Sep 15, 2011 at 2:40 AM, Masao Uebayashi wrote:
> On Thu, Sep 15, 2011 at 2:13 AM, Emmanuel Dreyfus wrote:
>> Masao Uebayashi wrote:
>>
>>> You're faulting on a busy (PG_BUSY) anon page whose owner is also an
>>> anon. I suppose the page is being swapped either in or out...
>>
>> The qu
Masao Uebayashi wrote:
> > The question is why does it fail to complete?
> Hmm, I don't see any relevant "if (PG_WANTED) wakeup(pg);" in swap
> code path, like that done in genfs_getpages().
Does that means that we have processes that can sleep awaiting for a
page and never wake up whatever happ
On Thu, Sep 15, 2011 at 2:13 AM, Emmanuel Dreyfus wrote:
> Masao Uebayashi wrote:
>
>> You're faulting on a busy (PG_BUSY) anon page whose owner is also an
>> anon. I suppose the page is being swapped either in or out...
>
> The question is why does it fail to complete?
Hmm, I don't see any rel
Masao Uebayashi wrote:
> You're faulting on a busy (PG_BUSY) anon page whose owner is also an
> anon. I suppose the page is being swapped either in or out...
The question is why does it fail to complete?
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
On Wed, Sep 14, 2011 at 9:27 PM, Emmanuel Dreyfus wrote:
> Emmanuel Dreyfus wrote:
>
>> 1 3 0 4 cacaf300 glusterfsd anonget2
>
> I can see in src/sys/uvm/uvm_fault.c that anonget2 means sleeping a
> page. But how is it supposed to awaken up?
You're faulting on a bu
Emmanuel Dreyfus wrote:
> 1 3 0 4 cacaf300glusterfsd anonget2
I can see in src/sys/uvm/uvm_fault.c that anonget2 means sleeping a
page. But how is it supposed to awaken up?
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
Takeshi Nakayama wrote:
> "sysctl -w vm.swapout=0" helps you?
It yields an interesting result: I now have a partial deadlock, with is
limited to the PUFFS filesystem. It is now possible to login and work in
the shell, something that was not possible with vm.swapout=1
However, even after memory
>>> m...@netbsd.org (Emmanuel Dreyfus) wrote
> I had a look at src/sys/uvm/uvm_glue.c changes bewteen netbsd-5 and
> -current. Revision 1.140-1.141 is interesting: netbsd-5 has a swappable
> u-area, and the change drops that "feature" to fix issues.
>
> The problem is documented in kern/38828
> h
Emmanuel Dreyfus wrote:
[using mlockall(2) to prevent userland filesystem to be swaped out]
> It seemed to work, but after intensive testing I still see deadlocks,
> and a printf added in uvm_swapout() shows that perfused still gets
> swaped out despite the mlockall call. Is that a bug?
>
> This
Manuel Bouyer wrote:
> You have mlock(2) for this. I think ntpd uses it, you can have a look here.
> Of course you don't want to mlock a big process ...
You meant mlockall(2)
It seemed to work, but after intensive testing I still see deadlocks,
and a printf added in uvm_swapout() shows that per
Manuel Bouyer wrote:
> > Do you suggest to use mlock(NULL, (size_t)-1) ?
> this probably won't work. Look at how ntpd uses it ...
Using mlockall(MCL_CURRENT|MCL_FUTURE) on perfused fixes the problem. No
more deadlocks, the swap is used when memory gets low.
I was not even necessary to do the s
Hello. Can you use the mlock(2) system call for this purpose? This
looks like just what it was intended for.
Does it actually work in our system?
-Brian
On Sep 5, 7:37am, Emmanuel Dreyfus wrote:
} Subject: Re: netbsd-5 deadlocks when memory is low
} On Fri, Sep 02, 2011 at 06:54:55AM
On Mon, Sep 05, 2011 at 09:17:29AM +, Emmanuel Dreyfus wrote:
> On Mon, Sep 05, 2011 at 10:46:50AM +0200, Manuel Bouyer wrote:
> > You have mlock(2) for this. I think ntpd uses it, you can have a look here.
> > Of course you don't want to mlock a big process ...
>
> Do you suggest to use mlock
On Mon, Sep 05, 2011 at 09:17:29AM +, Emmanuel Dreyfus wrote:
> On Mon, Sep 05, 2011 at 10:46:50AM +0200, Manuel Bouyer wrote:
> > You have mlock(2) for this. I think ntpd uses it, you can have a look here.
> > Of course you don't want to mlock a big process ...
>
> Do you suggest to use mlock
On Mon, Sep 05, 2011 at 10:46:50AM +0200, Manuel Bouyer wrote:
> You have mlock(2) for this. I think ntpd uses it, you can have a look here.
> Of course you don't want to mlock a big process ...
Do you suggest to use mlock(NULL, (size_t)-1) ?
--
Emmanuel Dreyfus
m...@netbsd.org
On Mon, Sep 05, 2011 at 07:37:08AM +, Emmanuel Dreyfus wrote:
> On Fri, Sep 02, 2011 at 06:54:55AM +0200, Emmanuel Dreyfus wrote:
> > A common case of deadlocks is ioflush waiting for the filesystem and the
> > filesystem waiting for memory.
>
> I made some progress in my understanding of the
On Fri, Sep 02, 2011 at 06:54:55AM +0200, Emmanuel Dreyfus wrote:
> A common case of deadlocks is ioflush waiting for the filesystem and the
> filesystem waiting for memory.
I made some progress in my understanding of the issue: the filesystem
server (perfused) gets swaped out and this is why all
On Fri, Sep 02, 2011 at 12:55:24AM -0400, Mouse wrote:
> That won't fix it; it'll just make it less likely to deadlock. If
> there is nothing to flush _except_ to the user filesystem that wants
> memory, the system is still wedged.
Sure, but at least it will remove a whole set of deadlocks, where
> If the above is correct, then we have this situation: [...]
> If I am correct, then a possible fix could be to have one ioflush
> thread per filesystem. This woule ensure that a user filesystem
> awaiting memory would not prevent ioflush work on local filesystem.
That won't fix it; it'll just
David Holland wrote:
> The right fix is to not let the kernel wait for userspace. In this
> case that's probably not trivial.
A common case of deadlocks is ioflush waiting for the filesystem and the
filesystem waiting for memory.
As I understand, pagedaemon is responsible for scheduling memory
On Wed, Aug 31, 2011 at 09:06:14AM +, David Holland wrote:
> The right fix is to not let the kernel wait for userspace. In this
> case that's probably not trivial.
I am trying to track non local vnode being paged, and exclude them
from pagedaemon activation trigger. However I have trouble to
On Mon, Aug 29, 2011 at 03:49:09AM +0200, Emmanuel Dreyfus wrote:
> > because those requests have ended up in your file server which is
> > waiting for page daemon and thus deadlock? just a wild guess, though.
>
> So the right fix would be to have a uvm.user_paging count for pages from
> non
Manuel Bouyer wrote:
> > Code from HEAD works much better: intead of freezing, it kills big
> > processes. It would be nice if we could reach that stage on netbsd-5:
> > having a system freezing is bad. It does not even reboots on its own.
> Are big processes killed even if there's free pages in
hi,
> YAMAMOTO Takashi wrote:
>
>> because those requests have ended up in your file server which is
>> waiting for page daemon and thus deadlock? just a wild guess, though.
>
> So the right fix would be to have a uvm.user_paging count for pages from
> non kernel FS, and exclude that count fro
YAMAMOTO Takashi wrote:
> because those requests have ended up in your file server which is
> waiting for page daemon and thus deadlock? just a wild guess, though.
So the right fix would be to have a uvm.user_paging count for pages from
non kernel FS, and exclude that count from the test?
--
hi,
> YAMAMOTO Takashi wrote:
>
>> the pagedaemon stays sleeping because there's already enough paging
>> requests in-progress. (see "paging=" in the ddb show uvm output.)
>> it (reasonably) assumes i/o will complete "soon".
>
> The offending code would be
yes.
>
> void
> uvm_kick_pdaemon
YAMAMOTO Takashi wrote:
> the pagedaemon stays sleeping because there's already enough paging
> requests in-progress. (see "paging=" in the ddb show uvm output.)
> it (reasonably) assumes i/o will complete "soon".
The offending code would be
void
uvm_kick_pdaemon(void)
{
KASSERT(mu
hi,
> I have been dealing with a deadlock problem on netbsd-5, which I though was
> related to PUFFS, but it seems it is another problem, hence the new thread.
>
> When running perfused stress test (a build of NetBSD over a glusterfs
> volume), memory gets low, and the machine hangs. I can see io
On Sun, Aug 28, 2011 at 12:44:22PM +, Emmanuel Dreyfus wrote:
> [...]
> Code from HEAD works much better: intead of freezing, it kills big
> processes. It would be nice if we could reach that stage on netbsd-5:
> having a system freezing is bad. It does not even reboots on its own.
Are big pr
I have been dealing with a deadlock problem on netbsd-5, which I though was
related to PUFFS, but it seems it is another problem, hence the new thread.
When running perfused stress test (a build of NetBSD over a glusterfs
volume), memory gets low, and the machine hangs. I can see ioflush
is sleepi
40 matches
Mail list logo