> " " == Michael Eisler <[EMAIL PROTECTED]> writes:
> What if someone has written to multiple, non-contiguous regions
> of a page?
This has been foreseen and hence we only allow 1 contiguous region per
page. If somebody tries to schedule a write to a second region that is
not conti
> > " " == Michael Eisler <[EMAIL PROTECTED]> writes:
>
> >> I'm not clear on why you want to enforce page alignedness
> >> though? As long as writes respect the lock boundaries (and not
> >> page boundaries) why would use of a page cache change matters?
>
> > For the reason
> " " == Michael Eisler <[EMAIL PROTECTED]> writes:
>> I'm not clear on why you want to enforce page alignedness
>> though? As long as writes respect the lock boundaries (and not
>> page boundaries) why would use of a page cache change matters?
> For the reason that was poin
> Yes. fs/read_write calls the NFS subsystem. The problem then is that
> NFS uses the generic_file_{read,write,mmap}() interfaces. These are
> what enforce use of the page cache.
So, don't use generic*() when locking is active. It's what most other
UNIX-based NFS clients do. Even if it is "stupi
On 16 Sep 2000, Trond Myklebust wrote:
>
> Yes. fs/read_write calls the NFS subsystem. The problem then is that
> NFS uses the generic_file_{read,write,mmap}() interfaces. These are
> what enforce use of the page cache.
>
> You could drop these functions, but that would mean designing an
> ent
> I'm not a Linux kernel literate. However, I found your
> assertion surprising. Does procfs do page i/o as well?
No. An fs isnt required to use the page caches at all. It makes life a lot saner
to do so
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a m
> " " == Michael Eisler <[EMAIL PROTECTED]> writes:
> I'm not a Linux kernel literate. However, I found your
> assertion surprising. Does procfs do page i/o as well?
No. It has its own setup.
> file.c in fs/nfs suggests that the Linux VFS has non-page
> interfaces in add
> > " " == Michael Eisler <[EMAIL PROTECTED]> writes:
>
> > Focus on correctness and do the expedient thing first, which
> > is:
> > - The first time a file is locked, flush dirty pages
> >to the server, and then invalidate the page cache
>
> This would be
> " " == Michael Eisler <[EMAIL PROTECTED]> writes:
> Focus on correctness and do the expedient thing first, which
> is:
> - The first time a file is locked, flush dirty pages
> to the server, and then invalidate the page cache
This would be implemented with the
> > " " == James Yarbrough <[EMAIL PROTECTED]> writes:
>
> > What is done for bypassing the cache when the size of a file
> > lock held by the reading/writing process is not a multiple of
> > the caching granularity? Consider two different clients with
> > processes shari
> " " == James Yarbrough <[EMAIL PROTECTED]> writes:
> What is done for bypassing the cache when the size of a file
> lock held by the reading/writing process is not a multiple of
> the caching granularity? Consider two different clients with
> processes sharing a file an
> This will always invalidate the page cache whenever we try to obtain
> the lock, hence you are guaranteed that the cache will be reread after
> the lock was grabbed.
> After unlocking however one needs no guarantees other than ensuring
> that any modifications were committed while we held the l
> " " == Michael Eisler <[EMAIL PROTECTED]> writes:
> The fix still does not provide coherency guarantees in all
> situations, and at minimum, there ought to be a way to force
> the client provide a coherency guarantee.
Yes. I came to the same conclusion after having sent it o
Jeff Epler <[EMAIL PROTECTED]> writes:
> > > Is there a solution that would allow the kind of guarantee our
> > > software wants with non-linux nfsds without the cache-blowing
> > > that the change I'm suggesting causes?
Trond:
> > As you can see, the idea is to look at whether or
> > " " == Jeff Epler <[EMAIL PROTECTED]> writes:
>
> > Is there a solution that would allow the kind of guarantee our
> > software wants with non-linux nfsds without the cache-blowing
> > that the change I'm suggesting causes?
>
> How about something like the following compro
Theodore Y. Ts'o writes:
> From: "Albert D. Cahalan" <[EMAIL PROTECTED]>
>> The ext2 inode has 6 obviously free bytes, 6 that are only used
>> on filesystems marked as Hurd-type, and 8 that seem to be claimed
>> by competing security and EA projects. So, being wasteful, it would
>> be possible to
Date: Thu, 14 Sep 2000 17:03:11 +0200 (CEST)
From: Trond Myklebust <[EMAIL PROTECTED]>
For the timestamps, yes, but inode caching will take most of that
hit. After all, the only time stat() reads from disk is when the inode
has completely fallen out of the cache.
For commonly used
> " " == Theodore Y Ts'o <[EMAIL PROTECTED]> writes:
>Would it perhaps make sense to use one of these last 'free'
>fields as a pointer to an 'inode entension'? If you still
>want ext2fs to be able to accommodate new projects and
>ideas, then it seems that
Date: Thu, 14 Sep 2000 15:09:35 +0200 (CEST)
From: Trond Myklebust <[EMAIL PROTECTED]>
Would it perhaps make sense to use one of these last 'free' fields
as a pointer to an 'inode entension'?
If you still want ext2fs to be able to accommodate new projects and
ideas, then it seem
> " " == Theodore Y Ts'o <[EMAIL PROTECTED]> writes:
> There has been some talk of doubling the size of the ext2
> inode, which will of course cause some backwards compatibility
> problems and would mean that you would only be able to use
> certain advanced features on new
From: "Albert D. Cahalan" <[EMAIL PROTECTED]>
Date:Wed, 13 Sep 2000 19:20:42 -0400 (EDT)
The ext2 inode has 6 obviously free bytes, 6 that are only used
on filesystems marked as Hurd-type, and 8 that seem to be claimed
by competing security and EA projects. So, being wastef
On Thu, Sep 14, 2000 at 11:51:20AM +0200, Trond Myklebust wrote:
> The reason I'm sceptical to this and other in-core type solutions is
> that knfsd doesn't keep files open for long: basically it opens a file
> and close it upon processing of each and every request from a
> client. You can therefo
> " " == Andi Kleen <[EMAIL PROTECTED]> writes:
> On Thu, Sep 14, 2000 at 10:49:59AM +0200, Trond Myklebust
> wrote:
>> > " " == Albert D Cahalan <[EMAIL PROTECTED]> writes:
>>
>> > The ext2 inode has 6 obviously free bytes, 6 that are only
>> > used on filesyste
On Thu, Sep 14, 2000 at 10:49:59AM +0200, Trond Myklebust wrote:
> > " " == Albert D Cahalan <[EMAIL PROTECTED]> writes:
>
> > The ext2 inode has 6 obviously free bytes, 6 that are only used
> > on filesystems marked as Hurd-type, and 8 that seem to be
> > claimed by competing
> " " == Albert D Cahalan <[EMAIL PROTECTED]> writes:
> The ext2 inode has 6 obviously free bytes, 6 that are only used
> on filesystems marked as Hurd-type, and 8 that seem to be
> claimed by competing security and EA projects. So, being
> wasteful, it would be possible t
> " " == Jeff Epler <[EMAIL PROTECTED]> writes:
> Is there a solution that would allow the kind of guarantee our
> software wants with non-linux nfsds without the cache-blowing
> that the change I'm suggesting causes?
How about something like the following compromise?
I haven
On Wed, Sep 13, 2000 at 11:56:56PM +0200, Trond Myklebust wrote:
> After consultation with the appropriate authorities, it turns out that
> the Sun-code based clients do in fact turn off data caching entirely
> when using NLM file locking.
Entirely? That's interesting, because for our consistenc
I think that both the NFS server changes that Trond is suggesting and the
NFS client changes that I am suggesting have their place.
The fact that the tuple (mtime, size) is used to test to test for
unchangedness of a file indicates that the people who designed NFS
understood that just using mtime
Trond Myklebust writes:
> > " " == Albert D Cahalan <[EMAIL PROTECTED]> writes:
>> 20 bits * 3 timestamps == 60 bits 60 bits <= 8 bytes
>>
>> So you do need 8 bytes.
>
> Yes. Assuming that you want to implement the microseconds field on
> all 3 timestamps. For NFS I would only need that preci
> " " == Andi Kleen <[EMAIL PROTECTED]> writes:
> Steal a couple of bytes from what ?
>From the 32-bit storage area currently allocated to i_generation on
the on-disk ext2fs inode (as per Ted's suggestion). With current
disk/computing speeds, you probably don't need the full 32 bits to
> " " == Jeff Epler <[EMAIL PROTECTED]> writes:
> But "ctime and file size are the same" does not prove that the
> file is unchanged. That's the root of this problem, and why
> NFS_CACHEINV(inode) is not enough to ensure coherency.
> Furthermore, according to NFSv4, what
> " " == Theodore Y Ts'o <[EMAIL PROTECTED]> writes:
> Why? i_generation is a field which is only used by the NFS
> server. As far as ext2 is concerned, it's a black box value.
> Currently, as I understand things (and you're much more the
> export on the NFS code than I
On Wed, Sep 13, 2000 at 10:35:10PM +0200, Trond Myklebust wrote:
> > No, it does not help. i_generation is only in the NFS file
> > handle, but not in the fattr/file id this is used for cache
> > checks. The NFS file handle has to stay identical anyways, as
> > long as the inod
On Wed, Sep 13, 2000 at 12:42:15PM +0200, Trond Myklebust wrote:
>
> Is there any 'standard' function for reading the microseconds fields
> in userland? I couldn't find anything with a brief search in the
> X/Open docs.
>
Both Digital OSF/1 and Solaris use 3 undocumented spare fields in the
stru
Date: Wed, 13 Sep 2000 22:35:10 +0200 (CEST)
From: Trond Myklebust <[EMAIL PROTECTED]>
You might be able to steal a couple of bytes and then rewrite ext2fs
to mask those out from the 'i_generation' field, but it would mean that
you could no longer boot your old 2.2.16 kernel withou
> " " == Andi Kleen <[EMAIL PROTECTED]> writes:
> On Wed, Sep 13, 2000 at 11:51:45AM -0400, Theodore Y. Ts'o
> wrote:
>>
>> So this is a really stupid question, but I'll ask it anyway.
>> If you just need a cookie, is there any way that you might be
>> able to steal
> " " == Albert D Cahalan <[EMAIL PROTECTED]> writes:
> 20 bits * 3 timestamps == 60 bits 60 bits <= 8 bytes
> So you do need 8 bytes.
Yes. Assuming that you want to implement the microseconds field on all
3 timestamps. For NFS I would only need that precision on mtime.
Does anyb
On Wed, Sep 13, 2000 at 11:51:45AM -0400, Theodore Y. Ts'o wrote:
>
> So this is a really stupid question, but I'll ask it anyway. If you
> just need a cookie, is there any way that you might be able to steal a
> few bits from i_generation field for that purpose? (This assumes that
> we only wo
Date:Wed, 13 Sep 2000 12:54:49 +0200 (CEST)
From: Trond Myklebust <[EMAIL PROTECTED]>
Don't forget that 2^20 > 10^6, hence if you really want units of
microseconds, you actually only need to save 3 bytes worth of data per
timestamp.
For the purposes of NFS, however the
Trond Myklebust writes:
> > " " == Albert D Cahalan <[EMAIL PROTECTED]> writes:
>> It would not be reasonable to try extending ext2 to 64-bit
>> times, but milliseconds might be doable. You'd need 4 bytes to
>> support the 3 standard time stamps.
>>
>> Going to microseconds would require 8 fr
> " " == Jeff Epler <[EMAIL PROTECTED]> writes:
> On Wed, Sep 13, 2000 at 02:53:02PM +0200, Trond Myklebust
> wrote:
>> No. Things fall in and out of the inode cache all the
>> time. That's a vicious circle that's going to lead to a lot of
>> unnecessary traffic.
>
On Wed, Sep 13, 2000 at 02:53:02PM +0200, Trond Myklebust wrote:
> No. Things fall in and out of the inode cache all the time. That's a
> vicious circle that's going to lead to a lot of unnecessary traffic.
The traffic is not "unnecessary" if it's needed to work with today's NFS
servers and get c
> " " == Jamie Lokier <[EMAIL PROTECTED]> writes:
> If we're getting serious about adding finer-grained mtimes to
> ext2, could we please consider using those bits as a way to
> know, for sure, whether a file has changed?
Agreed.
> Btw, the whole cache coherency thing wo
Trond Myklebust wrote:
> > It would not be reasonable to try extending ext2 to 64-bit
> > times, but milliseconds might be doable. You'd need 4 bytes to
> > support the 3 standard time stamps.
>
> > Going to microseconds would require 8 free bytes, which I don't
> > think
On Wed, Sep 13, 2000 at 12:42:15PM +0200, Trond Myklebust wrote:
> > " " == Andi Kleen <[EMAIL PROTECTED]> writes:
>
> > On Tue, Sep 12, 2000 at 09:10:56PM +0200, Trond Myklebust
> > wrote:
> >> Making mtime a true 64-bit cookie on Linux servers would be a
> >> solution that
> " " == Albert D Cahalan <[EMAIL PROTECTED]> writes:
> It would not be reasonable to try extending ext2 to 64-bit
> times, but milliseconds might be doable. You'd need 4 bytes to
> support the 3 standard time stamps.
> Going to microseconds would require 8 free bytes, wh
> " " == Andi Kleen <[EMAIL PROTECTED]> writes:
> On Tue, Sep 12, 2000 at 09:10:56PM +0200, Trond Myklebust
> wrote:
>> Making mtime a true 64-bit cookie on Linux servers would be a
>> solution that works on all clients.
> Making mtime 64bit would also be useful for lo
Trond Myklebust writes:
> Relying on the sub-second timing fields is a much more
> implementation-specific. It depends on the capabilities of both the
> filesystem and the server OS.
> Linux and the knfsd server code could easily be modified to provide
> such a service, but the problem (as I've s
On Tue, Sep 12, 2000 at 09:10:56PM +0200, Trond Myklebust wrote:
> Making mtime a true 64-bit cookie on Linux servers would be a solution
> that works on all clients.
Making mtime 64bit would also be useful for local parallel make runs,
the current second resolution leads to race conditions on b
On Tue, Sep 12, 2000 at 09:10:56PM +0200, Trond Myklebust wrote:
> > " " == Alan Cox <[EMAIL PROTECTED]> writes:
>
> > Providing everyone is careful to hold a lock I think it is
>
> > lockf() is a read barrier providing the local cache is flushed,
> > the unlock is a write bar
On Tue, Sep 12, 2000 at 07:08:09PM +0200, Trond Myklebust wrote:
> This is a known issue, and is not easy to fix. Neither of the
> solutions you propose are correct since they will both cause a cache
> invalidation. This is not the same as cache coherency checking.
>
>
> The correct solution in
> > The fix we have found is to *either* have NFS_CACHEINV(inode)
> > change inode-> i_mtime to an artificial value (0) or to call
> > nfs_zap_caches instead. Since I am not sure which fix is
> > appropriate, I'm not enclosing an actual patch.
>
> This is a known issue, and i
> " " == Alan Cox <[EMAIL PROTECTED]> writes:
> Providing everyone is careful to hold a lock I think it is
> lockf() is a read barrier providing the local cache is flushed,
> the unlock is a write barrier providing the local cache is
> flushed first. Providing all users a
> " " == Jeff Epler <[EMAIL PROTECTED]> writes:
> Is there any solution that's available today, and doesn't
> depend on using Linux in the server? I suspect that we will
> have to distribute a modified nfs client with our app, and
> we're prepared to accept the cache inva
> " " == Jeff Epler <[EMAIL PROTECTED]> writes:
> But "ctime and file size are the same" does not prove that the
^ mtime
> file is unchanged. That's the root of this problem, and why
> NFS_CACHEINV(inode) is not enough to ensure coherency.
Yes, but I just demo
> If we have a purely Linux-specific hack to ensure cache coherency,
> that will still corrupt the cache on those *NIX clients that use
> ordinary cache coherency checking (i.e. checking mtime + file size)
> rather than cache invalidation.
Its what Solaris implements and what SunOS back down to 3
> " " == Jeff Epler <[EMAIL PROTECTED]> writes:
> I believe I have discovered a problem in the Linux 2.2 nfs
> client. The short story is that NFS_CACHEINV(inode) does not
> "act as a cache coherency point" (as nfs/file.c:nfs_lock()'s
> comment implies) when multiple modi
57 matches
Mail list logo