вівторок 28 березень 2006 05:27, Peter Jeremy написав:
> I'd suggest that each mmap be capable of storing several hundred msec of
> data as a minumum (maybe 10MB input and 5MB output, preferably more).
> Synchronisation can be done by writing tokens into pipes shared with the
> mmap's, optimised by
On Sat, 2006-Mar-25 21:39:27 +1100, Peter Jeremy wrote:
>What happens if you simulate read-ahead yourself? Have your main
>program fork and the child access pages slightly ahead of the parent
>but do nothing else.
I suspect something like this may be the best approach for your application.
My su
On Saturday 25 March 2006 06:46 pm, Peter Jeremy wrote:
= My guess is that the read-ahead algorithms are working but aren't doing
= enough re-ahead to cope with "read a bit, do some cpu-intensive processing
= and repeat" at 25MB/sec so you're winding up with a degree of serialisation
= where the I/
On Fri, 2006-Mar-24 15:18:00 -0500, Mikhail Teterin wrote:
>On the machine, where both mzip and the disk run at only 50%, the disk is a
>plain SATA drive (mzip's state goes from "RUN" to "vnread" and back).
...
> 18 usersLoad 0.46 0.53 0.60 24 ??? 15:15
>
>Mem:KBREAL
On Sat, 2006-Mar-25 09:20:13 -0500, Mikhail Teterin wrote:
>I'm sorry, that should be http://aldan.algebra.com/~mi/mzip.c -- I checked
>this time :-(
It doesn't look like it's doing anything especially weird. As Matt
pointed out, creating files with mmap() is not a good idea because the
syncer
On Sat, 2006-Mar-25 10:29:17 -0800, Matthew Dillon wrote:
>Really odd. Note that if your disk can only do 25 MBytes/sec, the
>calculation is: 2052167894 / 25MB = ~80 seconds, not ~60 seconds
>as you would expect from your numbers.
systat was reporting 25-26 MB/sec. dd'ing the underl
Mikhail Teterin wrote this message on Sat, Mar 25, 2006 at 09:20 -0500:
> = The downside is that touching an uncached page triggers a trap which may
> = not be as efficient as reading a block of data through the filesystem
> = interface, and I/O errors are delivered via signals (which may not be as
:The results here are weird. With 1GB RAM and a 2GB dataset, the
:timings seem to depend on the sequence of operations: reading is
:significantly faster, but only when the data was mmap'd previously
:There's one outlier that I can't easily explain.
:...
:Peter Jeremy
Really odd. Note that i
On Saturday 25 March 2006 05:39 am, Peter Jeremy wrote:
= On Fri, 2006-Mar-24 15:18:00 -0500, Mikhail Teterin wrote:
= >which there is not with the read. Read also requires fairly large
= >buffers in the user space to be efficient -- *in addition* to the
= >buffers in the kernel.
=
= I disagree.
On Fri, 2006-Mar-24 15:18:00 -0500, Mikhail Teterin wrote:
>which there is not with the read. Read also requires fairly large buffers in
>the user space to be efficient -- *in addition* to the buffers in the kernel.
I disagree. With a filesystem read, the kernel is solely responsible
for handli
On Fri, 2006-Mar-24 10:00:20 -0800, Matthew Dillon wrote:
>Ok. The next test is to NOT do umount/remount and then use a data set
>that is ~2x system memory (but can still be mmap'd by grep). Rerun
>the data set multiple times using grep and grep --mmap.
The results here are weird. W
> > May be the OS needs "reclaim-behind" for the sequential case?
> > This way you can mmap many many pages and use a much smaller
> > pool of physical pages to back them. The idea is for the VM
> > to reclaim pages N-k..N-1 when page N is accessed and allow
> > the same process to reuse this page
Matthew Dillon wrote:
> It is possible that the kernel believes the VM system to be too loaded
> to issue read-aheads, as a consequence of your blowing out of the system
> caches.
See attachment for the snapshot of `systat 1 -vm' -- it stays like that for
the most of the compression run tim
:On an amd64 system running about 6-week old -stable, both behave
:pretty much identically. In both cases, systat reports that the disk
:is about 96% busy whilst loading the cache. In the cache case, mmap
:is significantly faster.
:
:...
:turion% ls -l /6_i386/var/tmp/test
:-rw-r--r-- 1 peter
On Thu, 2006-Mar-23 15:16:11 -0800, Matthew Dillon wrote:
>FreeBSD. To determine which of the two is more likely, you have to
>run a smaller data set (like 600MB of data on a system with 1GB of ram),
>and use the unmount/mount trick to clear the cache before each grep test.
On an amd6
:I thought one serious advantage to this situation for sequential read
:mmap() is to madvise(MADV_DONTNEED) so that the pages don't have to
:wait for the clock hands to reap them. On a large Solaris box I used
:to have the non-pleasure of running the VM page scan rate was high, and
:I suggested t
> : time fgrep meowmeowmeow /home/oh.0.dump
> : 2.167u 7.739s 1:25.21 11.6% 70+3701k 23663+0io 6pf+0w
> : time fgrep --mmap meowmeowmeow /home/oh.0.dump
> : 1.552u 7.109s 2:46.03 5.2% 18+1031k 156+0io 106327pf+0w
> :
> :Use a big enough file to bust the memory caching (oh.
On Thu, Mar 23, 2006 at 03:16:11PM -0800, Matthew Dillon wrote:
> In anycase, this sort of test is not really a good poster child for how
> to use mmap(). Nobody in their right mind uses mmap() on datasets that
> they expect to be uncacheable and which are accessed sequentially. It's
:Yes, they both do work fine, but time gives very different stats for each. In
:my experiments, the total CPU time is noticably less with mmap, but the
:elapsed time is (much) greater. Here are results from FreeBSD-6.1/amd64 --
:notice the large number of page faults, because the system does no
четвер 23 березень 2006 15:48, Matthew Dillon Ви написали:
> Well, I don't know about FreeBSD, but both grep cases work just fine on
> DragonFly.
Yes, they both do work fine, but time gives very different stats for each. In
my experiments, the total CPU time is noticably less with mmap, b
:Actually, I can not agree here -- quite the opposite seems true. When running
:locally (no NFS involved) my compressor with the `-1' flag (fast, least
:effective compression), the program easily compresses faster, than it can
:read.
:
:The Opteron CPU is about 50% idle, *and so is the disk* pr
вівторок 21 березень 2006 17:48, Matthew Dillon Ви написали:
> Reading via mmap() is very well optimized.
Actually, I can not agree here -- quite the opposite seems true. When running
locally (no NFS involved) my compressor with the `-1' flag (fast, least
effective compression), the program
вівторок 21 березень 2006 20:09, Matthew Dillon Ви написали:
> 'vmstat 1' while the program is running would tell us if VM faults
> are creating an issue.
This problem -- vmstat and `systat -vm' occasionally stalling the entire
system -- did not go away, it just became less frequent and s
середа 22 березень 2006 15:20, Matthew Dillon Ви написали:
> The only real solution is to make the NFS client aware of the
> restricted user id exported by the server by requiring that the
> same uid be specified in the mount command the client uses to
> mount the NFS partition. Th
:This doesn't work with modes like 446 (which allow writing by everyone
:not in a particular group).
It should work just fine. The client validated the creds as of the
original operation (such as the mmap() or the original write()).
Regardless of what happens after that, if the creds
On Wed, 2006-Mar-22 15:33:49 -0800, Matthew Dillon wrote:
> solution. Basically the server would have to accept root creds but
> instead of translating them to a fixed uid it should allow the
> I/O operation to run as long as some non-root user would be able to
> do the I/O op.
This does
:What about different users accessing the same share from the same client?
:
: -mi
Yah, you're right. That wouldn't work. It would have to be a server-side
solution. Basically the server would have to accept root creds but
instead of translating them to a fixed uid it should all
:So, the problem is, the dirtied buffers _sometimes_ lose their owner and thus
:become root-owned. When the NFS client tries to flush them out, the NFS
:server (by default suspecting remote roots of being evil) rejects the
:flushing, which brings the client to its weak knees.
:
:1. Do the yet u
середа 22 березень 2006 14:03, Matthew Dillon Ви написали:
> I consider it a bug. I think the only way to reliably fix the problem
> is to give the client the ability to specify the uid to issue RPCs with
> in the NFS mount command, to match what the export does.
So, the problem is, t
:So mmap is just a more "reliable" way to trigger this problem, right?
:
:Is not this, like, a major bug? A file can be opened, written to for a while,
:and then -- at a semi-random moment -- the log will drop across the road?
:Ouch...
:
:Thanks a lot to all concerned for helping solve this probl
середа 22 березень 2006 12:23, Matthew Dillon Ви написали:
> My guess is that you are exporting the filesystem as a particular
> user id that is not root (i.e. you do not have -maproot=root: in the
> exports line on the server).
Yes, indeed, re-exporting with -maproot=0 leads to normal
My guess is that you are exporting the filesystem as a particular
user id that is not root (i.e. you do not have -maproot=root: in the
exports line on the server).
What is likely happening is that the NFS client is trying to push out
the pages using the root uid rather then the
Mikhail Teterin <[EMAIL PROTECTED]> wrote:
> (no softupdates). It was created with `-O1 -b 65536 -f 8192' as it is
> intended
> for large files and needs no ACLs (hence no UFS1).
Those values are very suboptimal. Whe creating a file system
for large files, you should rather decrease the inod
Matthew Dillon <[EMAIL PROTECTED]> wrote:
> There are a number of problems using a block size of 65536. First of
> all, I think you can only safely do it if you use a TCP mount, also
> assuming the TCP buffer size is appropriately large to hold an entire
> packet. For UDP moun
On Tue, Mar 21, 2006 at 09:07:48PM -0500, Mikhail Teterin wrote:
> в?второк 21 березень 2006 20:53, Matthew Dillon Ви написали:
> > Ah ha. That's the problem. I don't know why you are getting a write
> > error, but that is preventing the client from cleaning out the dirty
> > buffers.
вівторок 21 березень 2006 20:53, Matthew Dillon Ви написали:
> Ah ha. That's the problem. I don't know why you are getting a write
> error, but that is preventing the client from cleaning out the dirty
> buffers. The number of dirty buffers continues to rise and the client
> is ju
:>tcpdump -s 4096 -n -i -l port 2049
:
:Now I am thoroughly confused, the lines are very repetative:
:
:tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
:listening on em0, link-type EN10MB (Ethernet), capture size 4096 bytes
:20:41:55.788436 IP 172.21.128.43.2049 > 1
вівторок 21 березень 2006 20:09, Matthew Dillon Ви написали:
> If neither of those are an issue then I would guess that the problem
> could be related to the NFSv3 2-phase commit protocol. A way to test
> that would be to mount with NFSv2 and see if the problem still occurs.
Adding -2
вівторок 21 березень 2006 20:09, Matthew Dillon Ви написали:
> If the network bandwidth is still going full bore then the program is
> doing something. NFS retries would not account for it. A simple
> test for that would be to ^Z the program once it gets into this state
> and see
:The file stops growing, but the network bandwidth remains at 20Mb/s. `Netstat
:-s' on the client, had the following to say (udp and ip only):
If the network bandwidth is still going full bore then the program is
doing something. NFS retries would not account for it. A simple
test f
вівторок 21 березень 2006 19:25, Matthew Dillon Ви написали:
> If the program works over a local
> filesystem but fails to produce data in the output file on an NFS
> mount (but completes otherwise), then there is a bug in NFS somewhere.
> If the problem is simply due to the program
>From Mikhail Teterin <[EMAIL PROTECTED]>, Tue, Mar 21, 2006 at 06:58:01PM
>-0500:
> I'll try the TCP mount, workaround. If it helps, we can assume, our UDP NFS
> is
> broken for sustained high bandwidth writes :-(
What? I think you misunderstood. UDP NFS fairs poorly under
network congestio
:I don't specify either, but the default is UDP, is not it?
Yes, the default is UDP.
:> Now imagine a client that experiences this problem only
:> sometimes. Modern hardware, but for some reason (network
:> congestion?) some frames are still lost if sent back-to-back.
:> (Realtek chipset on
вівторок 21 березень 2006 18:48, Patrick M. Hausen Ви написали:
> Are you using TCP or UDP for your NFS mounts?
Ok, I just tried tcp as follows:
mount_nfs -r 8192 -w 8192 -U -otcp,intr,tcp pandora:/backup /backup
(oops, twice :-)
The symptoms are largely the same. The file stopped growi
вівторок 21 березень 2006 18:48, Patrick M. Hausen Ви написали:
> On Tue, Mar 21, 2006 at 06:26:45PM -0500, Mikhail Teterin wrote:
> > The problem is about same with 32K and 16K packets. With 8K packets, the
> > thing kind-of works (although trying to `systat -vm' still stalls disk
> > access), but
Hi!
On Tue, Mar 21, 2006 at 06:26:45PM -0500, Mikhail Teterin wrote:
> The problem is about same with 32K and 16K packets. With 8K packets, the
> thing
> kind-of works (although trying to `systat -vm' still stalls disk access), but
> the outgoing traffic is over 20Mb/s on average -- MUCH more,
вівторок 21 березень 2006 17:56, Matthew Dillon Ви написали:
> For UDP mounts, 65536 is too large (the UDP data length can
> only be 65536 bytes. For that matter, the *IP* packet itself can
> not exceed 65535 bytes. So 65536 will not work with a UDP mount.
Well, then the mount should
вівторок 21 березень 2006 17:48, Matthew Dillon Ви написали:
> :Actually, it does. The program tells it, that I don't care to read, what's
> :currently there, by specifying the PROT_READ flag only.
>
> That's an architectural flag. Very few architectures actually support
> write-only memor
:When the client is in this state it remains quite usable except for the
:following:
:
: 1) Trying to start `systat 1 -vm' stalls ALL access to local disks,
: apparently -- no new programs can start, and the running ones
: can not access any data either; attempts to Ctrl-C
:
: [Moved from -current to -stable]
:
:צ×ÔÏÒÏË 21 ÂÅÒÅÚÅÎØ 2006 16:23, Matthew Dillon ÷É ÎÁÐÉÓÁÌÉ:
:> You might be doing just writes to the mmap()'d memory, but the system
:> doesn't know that.
:
:Actually, it does. The program tells it, that I don't care to read, what's
:currentl
> When I mount with large read and write sizes:
>
> mount_nfs -r 65536 -w 65536 -U -ointr pandora:/backup /backup
>
> it changes -- for the worse. Short time into it -- the file stops growing
> according to the `ls -sl' run on the NFS server (pandora) at exactly 3200
> FS blocks (the FS was c
[Moved from -current to -stable]
вівторок 21 березень 2006 16:23, Matthew Dillon Ви написали:
> You might be doing just writes to the mmap()'d memory, but the system
> doesn't know that.
Actually, it does. The program tells it, that I don't care to read, what's
currently there, b
52 matches
Mail list logo