Hi Philip,

thanks for looking at this and for your insight.  The number of aspects
to keep in mind about this patch is growing, we have to disentangle.

I will send a minimal one-line patch to just fix the bug and do nothing
else.  We should get that one in quickly.
That would also be a candidate for -stable, i think.
I hope to come round to that tonight.

Then i will send two cleanup patches to remove useless stuff
and put the code into the right place, not changing any functionality.
That will make the cleanup easier to review.

Finally, we can work out how to do the optimization.
Probably, that will naturally factorize into two steps:

 (1) Use the information available in the userland buffer
     to avoid the getdents(2) syscall, *without* assuming
     monotonicity.

 (2) Further optimize searching when monotonicity is available.

Yours,
  Ingo


Philip Guenther wrote on Mon, Nov 04, 2013 at 12:26:34AM -0800:
> On Sat, Nov 2, 2013 at 5:48 PM, Ingo Schwarze <schwa...@usta.de> wrote:

>> Here is an updated patch which now works correctly with Otto's
>> regression test, with the new test i just committed, and with
>> the test from the Perl test suite Andrew pointed out, even with
>> threads enabled.  It also survived quite some manual testing.
> ...
>> Comments?  Tests?  OKs?
> ...
>>  void
>>  seekdir(DIR *dirp, long loc)
>>  {
>> +       struct dirent *dp;
>> +
>> +       /*
>> +        * First check whether the directory entry to seek for
>> +        * is still buffered in the directory structure in memory.
>> +        */
>> +
>>         _MUTEX_LOCK(&dirp->dd_lock);
>> -       __seekdir(dirp, loc);
>> +       dp = (struct dirent *)dirp->dd_buf;
>> +       if (dirp->dd_size > 0 && dp->d_off <= loc) {
>> +               for (dirp->dd_loc = 0;
>> +                    dirp->dd_loc < dirp->dd_size;
>> +                    dirp->dd_loc += dp->d_reclen) {
>> +                       dp = (struct dirent *)(dirp->dd_buf + dirp->dd_loc);
>> +                       if (dp->d_off < loc)
>> +                               continue;

> This diff assumes that the d_off values for directory entries are
> monotonically increasing as you advance through the directory.  That
> is not guaranteed and is false for our implementation for some
> filesystems.  The d_off values for NFS come straight from the NFS
> server, and the NFS RFCs place no requirement on the cookies returned
> by the server other than that the zero cookie indicates the first
> entry.  (An OpenBSD NFS server returns cookies from the underlying
> filesystem, so this will be difficult to see on a normal system.)
> 
> The current tmpfs code will also cause problems for this diff: the
> d_off values for tmpfs are practically random, being derived from
> kernel pointers returned by malloc(9).  That will need to change, as
> kernel pointer values shouldn't be exposed to userland like that, but
> it should serve as a way to confirm that this behavior causes problems
> for the optimized seekdir().
> 
> One possibility is for the kernel to tell userland whether d_off
> values are sure to be monotonically increasing for the opened
> directory and then have userland only use the opimization when that's
> the case.  We could add a pathconf/fpathconf name to get that
> information: fpathconf(fd, _PC_DIRECTORY_MONOTONIC_OFFSETS) anyone?

Reply via email to