vmstat still stalls (Re: more weird bugs with mmap-ing via NFS)

2006-03-23 Thread Mikhail Teterin
вівторок 21 березень 2006 20:09, Matthew Dillon Ви написали: >     'vmstat 1' while the program is running would tell us if VM faults >     are creating an issue. This problem -- vmstat and `systat -vm' occasionally stalling the entire system -- did not go away, it just became less frequent and s

Re: flushing "anonymous" buffers over NFS is rejected by server (more weird bugs with mmap-ing via NFS)

2006-03-23 Thread Mikhail Teterin
середа 22 березень 2006 15:20, Matthew Dillon Ви написали: >     The only real solution is to make the NFS client aware of the >     restricted user id exported by the server by requiring that the >     same uid be specified in the mount command the client uses to >     mount the NFS partition.  Th

Re: flushing "anonymous" buffers over NFS is rejected by server (more weird bugs with mmap-ing via NFS)

2006-03-23 Thread Matthew Dillon
:This doesn't work with modes like 446 (which allow writing by everyone :not in a particular group). It should work just fine. The client validated the creds as of the original operation (such as the mmap() or the original write()). Regardless of what happens after that, if the creds

Re: flushing "anonymous" buffers over NFS is rejected by server (more weird bugs with mmap-ing via NFS)

2006-03-22 Thread Peter Jeremy
On Wed, 2006-Mar-22 15:33:49 -0800, Matthew Dillon wrote: > solution. Basically the server would have to accept root creds but > instead of translating them to a fixed uid it should allow the > I/O operation to run as long as some non-root user would be able to > do the I/O op. This does

Re: flushing "anonymous" buffers over NFS is rejected by server (more weird bugs with mmap-ing via NFS)

2006-03-22 Thread Matthew Dillon
:What about different users accessing the same share from the same client? : : -mi Yah, you're right. That wouldn't work. It would have to be a server-side solution. Basically the server would have to accept root creds but instead of translating them to a fixed uid it should all

Re: flushing "anonymous" buffers over NFS is rejected by server (more weird bugs with mmap-ing via NFS)

2006-03-22 Thread Matthew Dillon
:So, the problem is, the dirtied buffers _sometimes_ lose their owner and thus :become root-owned. When the NFS client tries to flush them out, the NFS :server (by default suspecting remote roots of being evil) rejects the :flushing, which brings the client to its weak knees. : :1. Do the yet u

flushing "anonymous" buffers over NFS is rejected by server (more weird bugs with mmap-ing via NFS)

2006-03-22 Thread Mikhail Teterin
середа 22 березень 2006 14:03, Matthew Dillon Ви написали: >     I consider it a bug.  I think the only way to reliably fix the problem >     is to give the client the ability to specify the uid to issue RPCs with >     in the NFS mount command, to match what the export does. So, the problem is, t

Re: more weird bugs with mmap-ing via NFS

2006-03-22 Thread Matthew Dillon
:So mmap is just a more "reliable" way to trigger this problem, right? : :Is not this, like, a major bug? A file can be opened, written to for a while, :and then -- at a semi-random moment -- the log will drop across the road? :Ouch... : :Thanks a lot to all concerned for helping solve this probl

Re: more weird bugs with mmap-ing via NFS

2006-03-22 Thread Mikhail Teterin
середа 22 березень 2006 12:23, Matthew Dillon Ви написали: > My guess is that you are exporting the filesystem as a particular > user id that is not root (i.e. you do not have -maproot=root: in the > exports line on the server). Yes, indeed, re-exporting with -maproot=0 leads to normal

Re: more weird bugs with mmap-ing via NFS

2006-03-22 Thread Matthew Dillon
My guess is that you are exporting the filesystem as a particular user id that is not root (i.e. you do not have -maproot=root: in the exports line on the server). What is likely happening is that the NFS client is trying to push out the pages using the root uid rather then the

Re: more weird bugs with mmap-ing via NFS

2006-03-22 Thread Oliver Fromme
Mikhail Teterin <[EMAIL PROTECTED]> wrote: > (no softupdates). It was created with `-O1 -b 65536 -f 8192' as it is > intended > for large files and needs no ACLs (hence no UFS1). Those values are very suboptimal. Whe creating a file system for large files, you should rather decrease the inod

Re: more weird bugs with mmap-ing via NFS

2006-03-22 Thread Oliver Fromme
Matthew Dillon <[EMAIL PROTECTED]> wrote: > There are a number of problems using a block size of 65536. First of > all, I think you can only safely do it if you use a TCP mount, also > assuming the TCP buffer size is appropriately large to hold an entire > packet. For UDP moun

Re: more weird bugs with mmap-ing via NFS

2006-03-22 Thread Kostik Belousov
On Tue, Mar 21, 2006 at 09:07:48PM -0500, Mikhail Teterin wrote: > в?второк 21 березень 2006 20:53, Matthew Dillon Ви написали: > > Ah ha. That's the problem. I don't know why you are getting a write > > error, but that is preventing the client from cleaning out the dirty > > buffers.

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Mikhail Teterin
вівторок 21 березень 2006 20:53, Matthew Dillon Ви написали: > Ah ha. That's the problem. I don't know why you are getting a write > error, but that is preventing the client from cleaning out the dirty > buffers. The number of dirty buffers continues to rise and the client > is ju

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Matthew Dillon
:>tcpdump -s 4096 -n -i -l port 2049 : :Now I am thoroughly confused, the lines are very repetative: : :tcpdump: verbose output suppressed, use -v or -vv for full protocol decode :listening on em0, link-type EN10MB (Ethernet), capture size 4096 bytes :20:41:55.788436 IP 172.21.128.43.2049 > 1

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Mikhail Teterin
вівторок 21 березень 2006 20:09, Matthew Dillon Ви написали: >     If neither of those are an issue then I would guess that the problem >     could be related to the NFSv3 2-phase commit protocol.  A way to test >     that would be to mount with NFSv2 and see if the problem still occurs. Adding -2

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Mikhail Teterin
вівторок 21 березень 2006 20:09, Matthew Dillon Ви написали: > If the network bandwidth is still going full bore then the program is > doing something. NFS retries would not account for it. A simple > test for that would be to ^Z the program once it gets into this state > and see

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Matthew Dillon
:The file stops growing, but the network bandwidth remains at 20Mb/s. `Netstat :-s' on the client, had the following to say (udp and ip only): If the network bandwidth is still going full bore then the program is doing something. NFS retries would not account for it. A simple test f

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Mikhail Teterin
вівторок 21 березень 2006 19:25, Matthew Dillon Ви написали: > If the program works over a local >     filesystem but fails to produce data in the output file on an NFS >     mount (but completes otherwise), then there is a bug in NFS somewhere. >     If the problem is simply due to the program

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Jon Dama
>From Mikhail Teterin <[EMAIL PROTECTED]>, Tue, Mar 21, 2006 at 06:58:01PM >-0500: > I'll try the TCP mount, workaround. If it helps, we can assume, our UDP NFS > is > broken for sustained high bandwidth writes :-( What? I think you misunderstood. UDP NFS fairs poorly under network congestio

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Matthew Dillon
:I don't specify either, but the default is UDP, is not it? Yes, the default is UDP. :> Now imagine a client that experiences this problem only :> sometimes. Modern hardware, but for some reason (network :> congestion?) some frames are still lost if sent back-to-back. :> (Realtek chipset on

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Mikhail Teterin
вівторок 21 березень 2006 18:48, Patrick M. Hausen Ви написали: > Are you using TCP or UDP for your NFS mounts? Ok, I just tried tcp as follows: mount_nfs -r 8192 -w 8192 -U -otcp,intr,tcp pandora:/backup /backup (oops, twice :-) The symptoms are largely the same. The file stopped growi

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Mikhail Teterin
вівторок 21 березень 2006 18:48, Patrick M. Hausen Ви написали: > On Tue, Mar 21, 2006 at 06:26:45PM -0500, Mikhail Teterin wrote: > > The problem is about same with 32K and 16K packets. With 8K packets, the > > thing kind-of works (although trying to `systat -vm' still stalls disk > > access), but

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Patrick M. Hausen
Hi! On Tue, Mar 21, 2006 at 06:26:45PM -0500, Mikhail Teterin wrote: > The problem is about same with 32K and 16K packets. With 8K packets, the > thing > kind-of works (although trying to `systat -vm' still stalls disk access), but > the outgoing traffic is over 20Mb/s on average -- MUCH more,

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Mikhail Teterin
вівторок 21 березень 2006 17:56, Matthew Dillon Ви написали: > For UDP mounts, 65536 is too large (the UDP data length can >     only be 65536 bytes.  For that matter, the *IP* packet itself can >     not exceed 65535 bytes.  So 65536 will not work with a UDP mount. Well, then the mount should

Re: more weird bugs with mmap-ing via NFS

2006-03-21 Thread Matthew Dillon
:When the client is in this state it remains quite usable except for the :following: : : 1) Trying to start `systat 1 -vm' stalls ALL access to local disks, : apparently -- no new programs can start, and the running ones : can not access any data either; attempts to Ctrl-C

more weird bugs with mmap-ing via NFS

2006-03-21 Thread Mikhail Teterin
> When I mount with large read and write sizes: > > mount_nfs -r 65536 -w 65536 -U -ointr pandora:/backup /backup > > it changes -- for the worse. Short time into it -- the file stops growing > according to the `ls -sl' run on the NFS server (pandora) at exactly 3200 > FS blocks (the FS was c