Hi all,
While I understand that the ncache does add some disparity to the views
of various processes, I think it would be worth tracking down exactly
what is happening here. It seems like this particular error is one that
we should be able to avoid.
I agree with RobL that the ncache isn't go
On Mon, Aug 28, 2006 at 08:37:16PM +0200, Phil Carns wrote:
> There really isn't much way around this with the local ncache
> approach. Maybe the stock release should have the ncache disabled
> if this workload will be common (having one client delete a
> particular file and then a different clien
I think so. When one node deletes a file, it does not send out messages
to invalidate the cache in all of the other clients, so those still have
a cached (no longer valid) entry.
If those other clients then lookup the file it will succeed (as if
another client had won the race to create it),
On Mon, Aug 28, 2006 at 04:28:32PM -0400, Pete Wyckoff wrote:
> So yeah, the file gets deleted by just one task. Then they all
> simultaneously try to create it again.
That's also what happens when noncontig_coll2 was failing. We did ok
until a different process tried to open the file that anoth
[EMAIL PROTECTED] wrote on Mon, 28 Aug 2006 19:02 +0200:
> What happens on each iteration? Does the code at some point delete a
> file with a particular name and then create a new one with the same name?
Each iter (of which there are 200, but it fails on #2 or 3) does:
task0
rm file
Pete Wyckoff wrote:
The simul code, test #14, does a shared create: all processes
try to do "creat(file, 0644)" at the same time through the VFS.
There is no O_EXCL, so what should happen here is that they all
succeed, although under the hood, all but one will probably have
to unwind the SYS_CRE
Hi Pete,
I think I can take a look at that..
RobL mentioned something similar with another mpi based program that does
not work with ncache and I believe he has committed a workaround
(environment variable based) to selectively enable/disable the ncache.
thanks,
Murali
On Mon, 28 Aug 2006, Pete Wy
The simul code, test #14, does a shared create: all processes
try to do "creat(file, 0644)" at the same time through the VFS.
There is no O_EXCL, so what should happen here is that they all
succeed, although under the hood, all but one will probably have
to unwind the SYS_CREATE when they notice t