[zfs-code] cannot mount 'mypool': Input/output error

2007-11-15 Thread Ricardo Correia
Hi Nabeel,

Nabeel Saad wrote:
> So, we do ./run.sh & and then the zpool and zfs commands are available.  My 
> ZFS questions come here, once we run the create command, I get the error 
> directly: 
> 
> [root]# zpool create mypool sda 
> fuse: mount failed: Invalid argument 
> cannot mount 'mypool': Input/output error 

Are you sure this is what you wanted to do?

If you already had a pool in sda and you create another one in the same 
device, you will overwrite the old pool and you no longer will be able 
to access the data you had in there...

Perhaps you wanted to do "zpool import" instead?

> It seems the issue is with the mounting, and I can't understand why: 
> 
> [root]# zfs mount mypool 
> fuse: mount failed: Invalid argument 
> cannot mount 'mypool': Input/output error 

Before mounting the first zfs-fuse filesystem, you have to do "modprobe 
fuse".

> [root]# zfs allow 
> unrecognized command 'allow' 

This functionality is not available in the latest zfs-fuse version.

> Any thoughts or suggestions would be much appreciated! 

One suggestion: this mailing list is more appropriate for discussing the 
ZFS code. If you have questions regarding zfs-fuse, please use the more 
appropriate discussion group here:

http://groups.google.com/group/zfs-fuse/about

Thanks,
Ricardo

--
    * Ricardo Manuel Correia *

Lustre Engineering Group
*Sun Microsystems, Inc.*
Portugal

Ricardo.M.Correia at Sun.COM




[zfs-code] 512 byte IOs

2007-10-24 Thread Ricardo Correia
Hi,

In the process of testing the Lustre DMU-OSS with a write intensive 
workload, I have seen a performance issue where IOs were being sent to 
disk in 512 byte sizes (even though we are currently doing 4K writes per 
transaction).

I have noticed that vdev_queue.c is not being able to aggregate IOs, 
perhaps because vdev_file_io_start() is not doing asynchronous I/O.

To try to fix this, I have added ZIO_STAGE_VDEV_IO_START to the list of 
async I/O stages, which somewhat improved the number of IO aggregations, 
but not nearly enough. It seems that for some reason the number of nodes 
in vq_pending_tree and vq_deadline_tree don't go much above 1, even 
though the disk is always busy.

I have also noticed that the 1 GB file produced by this benchmark had >2 
million blocks, with an average block size (as reported by zdb -bbc) of 
524 bytes or so, instead of the 128 KB block size I expected. Even 
manually setting the "recordsize" property to 128 KB (which was already 
the default) didn't have any effect.

After changing the Lustre DMU code to call dmu_object_alloc() with a 
blocksize of 128 KB, throughput improved *a lot*.

Strangely (to me, at least), it seems that in ZFS all regular files are 
being created with 512 byte data block sizes, and that the "recordsize" 
property only affects the maximum write size per transaction in 
zfs_write(). Is this correct?

Comments and suggestions are welcome :)

Regards,
Ricardo



[zfs-code] Corrupted pools

2007-06-20 Thread Ricardo Correia
Ricardo Correia wrote:
>> # zpool status
>>  pool: media
>> state: FAULTED
>> scrub: none requested
>> config:
>>
>>NAMESTATE READ WRITE CKSUM
>>media   UNAVAIL  0 0 0  insufficient replicas
>>  raidz1UNAVAIL  0 0 0  corrupted data
>>sda ONLINE   0 0 0
>>sdb ONLINE   0 0 0
>>sdc ONLINE   0 0 0
>>sdd ONLINE   0 0 0
> 

Another weird behaviour, after rebooting:

> # zpool import -f media
> cannot import 'media': one or more devices is currently unavailable
> # zpool status
> no pools available
> # cfdisk /dev/sda
>  Disk Drive: /dev/sda
>   Size: 500107862016 bytes, 500.1 GB
>Pri/Log   Free Space  500105.25
> Same for sdb/c/d

After another reboot (and this is really strange):

> # zpool import
>  pool: media
>id: 18446744072804078091
> state: FAULTED
> action: The pool cannot be imported due to damaged devices or data.
> config:
>media   UNAVAIL   insufficient replicas
>  raidz1UNAVAIL   corrupted data
>sda ONLINE
>sdb ONLINE
>sdc ONLINE
>sdd ONLINE
> # zpool import media
> cannot import 'media': pool may be in use from other system
> # zpool import -f media
> cannot import 'media': one or more devices is currently unavailable 


> When I do less -f /dev/sda, I see "raidz", "/dev/sdb" etc, after lots of 
> glibberish, and I can successfully do grep someknownfilename -a|less, so it 
> seems that things were indeed written, and to this disk. 

Any ideas?



[zfs-code] Corrupted pools

2007-06-20 Thread Ricardo Correia
Hi,

I recently received reports of 2 users who experienced corrupted raid-z 
pools with ZFS-FUSE and I'm having trouble reproducing the problem or 
even figuring out what the cause is.

One of the users experienced corruption only by rebooting the system:

> # zpool status
>  pool: media
> state: FAULTED
> scrub: none requested
> config:
>
>NAMESTATE READ WRITE CKSUM
>media   UNAVAIL  0 0 0  insufficient replicas
>  raidz1UNAVAIL  0 0 0  corrupted data
>sda ONLINE   0 0 0
>sdb ONLINE   0 0 0
>sdc ONLINE   0 0 0
>sdd ONLINE   0 0 0

First I thought it was a problem of the device names being renamed 
(caused by different order of disk detection on boot), but I believe in 
this case ZFS would report the drive as UNAVAIL.

Anyway, exporting and re-importing didn't work:

> # zpool import
>  pool: media
>id: 18446744072804078091
> state: FAULTED
> action: The pool cannot be imported due to damaged devices or data.
> config:
>media   UNAVAIL   insufficient replicas
>  raidz1UNAVAIL   corrupted data
>sda ONLINE
>sdb ONLINE
>sdc ONLINE
>sdd ONLINE 

Another user experienced a similar problem but in a different circumstance:

He had a raid-z pool with 2 drives and while the system was idle he 
removed one of the drives. zfs-fuse doesn't notice the drive is removed 
until it tries to read or write to the device, so "zpool status" showed 
the drive was still online. Anyway, after a slightly confusing sequence 
of events (replugging the drive, zfs-fuse crashing(?!), and some other 
weirdness), the end result was the same:

 > pool: pool
 > state: UNAVAIL
 > scrub: none requested
 > config:
 >
 >NAMESTATE READ WRITE CKSUM
 >poolUNAVAIL  0 0 0  insufficient replicas
 >  raidz1UNAVAIL  0 0 0  corrupted data
 >sdc2ONLINE   0 0 0
 >sdd2ONLINE   0 0 0

I tried to reproduce this but I can't. When I remove a USB drive from a 
Raid-Z pool, zfs-fuse correctly shows READ/WRITE failures. I also tried 
killing zfs-fuse, changing the order of the drives and then starting 
zfs-fuse, but after exporting and importing it never corrupted the pool 
(although it found checksum errors on the drive that was unplugged, of 
course).

Something that might be useful knowing: zfs-fuse uses the block devices 
as if it were a normal file and it calls fsync() on the file descriptor 
when necessary (like in vdev_file.c), but this only guarantees that the 
kernel buffers are flushed, it doesn't actually send the flush command 
to the disk (unfortunately there's no DKIOCFLUSHWRITECACHE ioctl 
equivalent in Linux). Anyway, the possibility that this is the problem 
seems very remote to me (and it wouldn't explain the second case).

Do you have any idea what the problem could be or how can I determine 
the cause? I'm stuck at this point, and the first user seems to have 
lost 280 GB of data (he didn't have a backup)..

Regards,
Ricardo Correia



[zfs-code] Refactor zfs_zget()

2007-05-10 Thread Ricardo Correia
On Wednesday 09 May 2007 22:27:44 Mark Maybee wrote:
> Could you file an RFE for this via the opensolaris bug-reporting
> interface?

It was filed with the Change Request ID 646.

Thanks.



[zfs-code] Refactor zfs_zget()

2007-05-09 Thread Ricardo Correia
Hi Pawel,

On Wednesday 09 May 2007 23:40:53 Pawel Jakub Dawidek wrote:
> Simple test that does open(O_CREAT)/fstat()/unlink()/fstat() seems to
> work fine on both FreeBSD and Solaris. Maybe your problem is somewhere
> else?

I understand that, but the FUSE low-level API doesn't use vnodes to identify 
files, it uses the inode number.

So when a program calls fstat() it generates a GETATTR event in FUSE that must 
be handled by a function with this prototype:

--
int zfsfuse_getattr(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info 
*fi);
--

So the file is identified by the "ino" parameter (the inode number). If you're 
wondering what the "fi" parameter is for - in this handler it's reserved for 
future use and is always NULL.

So in order to get the associated vnode I must call zfs_zget() first, at the 
top of almost every FUSE handler:

...

vfs_t *vfs = (vfs_t *) fuse_req_userdata(req);
zfsvfs_t *zfsvfs = vfs->vfs_data;

ZFS_ENTER(zfsvfs);
znode_t *znode;

int error = zfs_zget(zfsvfs, ino, &znode, B_TRUE);
...

The problem is that zfs_zget() will return ENOENT if the inode has been 
unlinked, so that's why I had to add that boolean parameter.



[zfs-code] Refactor zfs_zget()

2007-05-09 Thread Ricardo Correia
On Wednesday 09 May 2007 04:57:53 Ricardo Correia wrote:
> 2) In the end of zfs_zget(), if the requested object number is not found,
> it allocates a new znode with that object number. This shouldn't happen in
> any FUSE operation.

Apparently, I didn't (and I still don't) fully understand this part of the 
code, but I still need it after all.

So I propose only a very simple change - an added boolean parameter that 
allows zfs_zget() to return unlinked objects.

See the attached patch (and sorry for the mismatched paths WRT the OpenSolaris 
tree).

Regards,
Ricardo Correia
-- next part --
A non-text attachment was scrubbed...
Name: zget.patch
Type: text/x-diff
Size: 9112 bytes
Desc: not available
URL: 
<http://mail.opensolaris.org/pipermail/zfs-code/attachments/20070509/ee9855cc/attachment.bin>


[zfs-code] Refactor zfs_zget()

2007-05-09 Thread Ricardo Correia
Hi,

Since almost all operations in the FUSE low-level API identify files by inode 
number, I've been using zfs_zget() to get the corresponding znode/vnode in 
order to call the corresponding VFS function in zfs_vnops.c.

However, there are some cases when zfs_zget() behaves slightly different than 
I need:

1) If zp->z_unlinked != 0 then zfs_zget() returns ENOENT. I need it to return 
the znode anyway, otherwise an open() followed by an unlink() followed by an 
fstat() would return ENOENT.

2) In the end of zfs_zget(), if the requested object number is not found, it 
allocates a new znode with that object number. This shouldn't happen in any 
FUSE operation.

So, in order to overcome these problems, I refactored zfs_zget into 
zfs_zget_common with one added flag parameter, in order to keep the same 
zfs_zget interface.

Is there any chance to apply the attached patch (or something similar) to the 
OpenSolaris ZFS codebase?

Thanks.

Regards,
Ricardo Correia
-- next part --
A non-text attachment was scrubbed...
Name: zfs_zget.patch
Type: text/x-diff
Size: 2427 bytes
Desc: not available
URL: 
<http://mail.opensolaris.org/pipermail/zfs-code/attachments/20070509/8504fed3/attachment.bin>


[zfs-code] Assertion due to timer overflow

2007-02-23 Thread Ricardo Correia
Ricardo Correia wrote:
> Today one zfs-fuse experienced a problem with a timer overflow on a
One zfs-fuse *user* :)

> In arc.c, arc_reclaim_thread() there is an ASSERT(growtime > 0) which
> fails in these 2 situations:

By the way, if my analysis is correct and lbolt wraps around to -2^31
(and not 0) then, once arc.no_grow becomes TRUE, it will never become
FALSE again (for another 200+ days).

So this could be a real problem even on production machines, I believe.



[zfs-code] Assertion due to timer overflow

2007-02-23 Thread Ricardo Correia
Hi,

Today one zfs-fuse experienced a problem with a timer overflow on a
32-bit machine with >300 days of uptime.

In arc.c, arc_reclaim_thread() there is an ASSERT(growtime > 0) which
fails in these 2 situations:

- In kernel context, if lbolt wraps around to a negative value. Since
clock_t has 32 bits on 32-bit machines, this will happen after 248.6
days of uptime with a 100 Hz clock tick.

- In user context (ztest), lbolt is defined to be (gethrtime() >> 23),
which gives us a 119 Hz clock tick, so it will wrap around to a negative
value if the machine has more than 208.9 days of uptime.

The attached patch should fix this issue.

-- next part --
An embedded and charset-unspecified text was scrubbed...
Name: timer.patch
URL: 



[zfs-code] Race condition in arc.c

2007-02-20 Thread Ricardo Correia
Hi,

I believe I have found a race condition in the ZFS ARC code.

The problem manifests itself only when debugging is turned on and 
arc_mru->arcs_size is very close to arc_mru->arcs_lsize.

It causes these assertions in arc.c to fail:
1) In remove_reference(): ASSERT3U(state->arcs_size, >=, 
state->arcs_lsize);
2) In arc_change_state(): ASSERT3U(new_state->arcs_size + to_delta, >=, 
new_state->arcs_lsize);

Steps to reproduce:
Well, in zfs-fuse it's just a matter of compiling in debug mode and 
running 'bonnie++ -f -d /pool'. It fails a couple of minutes into the rewrite 
test with the arc_change_state() assertion.

Currently, in zfs-fuse, there is a lot of context switching, which may trigger 
this bug more frequently than in Solaris. I have found that, with the 
bonnie++ workload, as much as 6 threads enter arc_change_state() at the same 
time, just before it fails.

Also, I have a relatively low (64 MB) arc_c_max value, which might be 
relevant.

It is much easier to reproduce if you apply this patch (relative to the 
current mercurial tip):

diff -r 773fb303fd36 usr/src/uts/common/fs/zfs/arc.c
--- a/usr/src/uts/common/fs/zfs/arc.c   Mon Feb 19 05:28:47 2007 -0800
+++ b/usr/src/uts/common/fs/zfs/arc.c   Tue Feb 20 05:23:03 2007 +
@@ -834,6 +834,8 @@ arc_change_state(arc_state_t *new_state,
 
if (use_mutex)
mutex_exit(&new_state->arcs_mtx);
+   if (new_state == arc_mru)
+   delay(hz); // sleep 1 second
}
}

With this patch, zfs-fuse will always crash in the remove_reference() 
assertion simply by mounting a filesystem (when compiled in debug mode, of 
course).

Unfortunately, even with that patch, it's a little hard to trigger the bug 
with ztest because arc_mru->arcs_size gets much bigger than 
arc_mru->arcs_lsize after a while. I had moderate success with the following 
steps:

1) Applying the patch
2) Changing the ztest_dmu_write_parallel test frequency from 
zopt_always to 
zopt_rarely (I'm unsure how much this actually helps)
3) Compiling in debug mode
4) Running ztest with parameters "-T600 -P3"
5) Keep retrying. Usually if it doesn't fail in the first 10 minutes, I 
think 
it's better to start ztest from the beginning..

I have fixed this bug with the attached patch, which I don't really like very 
much, but it fixes the race.
-- next part --
A non-text attachment was scrubbed...
Name: arc.patch
Type: text/x-diff
Size: 823 bytes
Desc: not available
URL: 



[zfs-code] Improper use of atomic_add_64().

2007-02-19 Thread Ricardo Correia
On Sunday 18 February 2007 01:41, Pawel Jakub Dawidek wrote:
> I found few places where such situation occurs. I wonder how this got
> unnoticed with ztest, which fails on me within a few seconds (after I
> started to use Solaris atomic operations) on assertions.  Maybe this
> only doesn't work when compiled with gcc? Not sure, but most of the time
> 64bit variables are used properly.

Hi Pawel,

I've been using the Solaris assembly code for the atomic operation since the 
beginning, but lately zfs-fuse has been failing with assertions in that exact 
same piece of code of your patch:

lib/libzpool/build-kernel/arc.c:736: arc_change_state: Assertion 
`new_state->size + to_delta >= new_state->lsize (0x1194000 >= 0x11a8000)` 
failed.

Were you hitting this bug?
Unfortunately your patch didn't fix this :(

I'm having trouble figuring out this problem..

Thanks.



[zfs-code] Assertion in arc_change_state

2007-01-24 Thread Ricardo Correia
On Wednesday 24 January 2007 00:04, eric kustarz wrote:
> Right, i would verify your locks are working correctly (especially
> make sure atomic_add_64() is truly atomic).  Note, these locks are in
> the ARC - so they are not in the VFS.

Yes, atomic_add_64() should be truly atomic, since I've taken that (assembly) 
code from OpenSolaris :)

Although I have to ask. The atomic_add_64() itself is atomic, but couldn't the 
ab->b_state->lsize value change between the atomic_add_64() and the 
ASSERT3U()?

Unless the mutex is protecting this value. But then why would atomic_add_64() 
be needed? Now I'm confused. As you can probably see already, I have no clue 
about that piece of code.. :)

The locks I was referring to were the VOP_RWLOCK() locks in the VFS read() and 
write() syscalls, possibly some others as well that I still haven't 
implemented. I have to do a code review to see what's missing.

Thanks.



[zfs-code] Assertion in arc_change_state

2007-01-23 Thread Ricardo Correia
On Tuesday 23 January 2007 19:01, Ricardo Correia wrote:
> My current code is tripping the following assertion:
> lib/libzpool/build-kernel/arc.c:736: arc_change_state: Assertion
> `new_state->size + to_delta >= new_state->lsize (0x2a6 >= 0x2a64000)`
> failed.

(snip)

> (gdb) print new_state->size
> $1 = 44695552
> (gdb) print to_delta
> $2 = 131072
> (gdb) print new_state->lsize
> $3 = 9792

I've just noticed how the "new_state->size + to_delta" value from the crash 
dump is different from the assertion.

I guess it might be a race condition, since I've just implemented 
multithreaded operation, but I think I'm missing some locks that are done by 
the Solaris VFS.



[zfs-code] Assertion in arc_change_state

2007-01-23 Thread Ricardo Correia
Hi,

My current code is tripping the following assertion:
lib/libzpool/build-kernel/arc.c:736: arc_change_state: Assertion 
`new_state->size + to_delta >= new_state->lsize (0x2a6 >= 0x2a64000)` 
failed.

gdb info:

Program terminated with signal 6, Aborted.
#0  0x2afcd767847b in raise () from /lib/libc.so.6
(gdb) bt
#0  0x2afcd767847b in raise () from /lib/libc.so.6
#1  0x2afcd7679da0 in abort () from /lib/libc.so.6
#2  0x00454dff in arc_change_state (new_state=0x591aa0, 
ab=0x2aaabe2930c0, hash_lock=)
at lib/libzpool/build-kernel/arc.c:735
#3  0x00457f32 in arc_access (buf=0x2aaabe2930c0, hash_lock=0x592c30) 
at lib/libzpool/build-kernel/arc.c:1637
#4  0x00458ff9 in arc_read_done (zio=0x2aaabcfa4ed0) at 
lib/libzpool/build-kernel/arc.c:1850
#5  0x0044fb9f in zio_done (zio=0x2aaabcfa4ed0) at 
lib/libzpool/build-kernel/zio.c:868
#6  0x004527f0 in zio_vdev_io_assess (zio=0x2aaabcfa4ed0) at 
lib/libzpool/build-kernel/zio.c:1491
#7  0x00466ecf in taskq_thread (arg=) at 
lib/libsolkerncompat/taskq.c:160
#8  0x2afcd74273ca in start_thread () from /lib/libpthread.so.0
#9  0x2afcd771555d in clone () from /lib/libc.so.6
#10 0x in ?? ()

(gdb) frame 2
#2  0x00454dff in arc_change_state (new_state=0x591aa0, 
ab=0x2aaabe2930c0, hash_lock=)
at lib/libzpool/build-kernel/arc.c:735
735 ASSERT3U(new_state->size + to_delta, >=,
(gdb) print new_state->size
$1 = 44695552
(gdb) print to_delta
$2 = 131072
(gdb) print new_state->lsize
$3 = 9792

My code is synced to the ON Mercurial repository onnv_56 tag, with some minor 
changes in arc.c (diff attached).

Do you have any idea what this might be?

Thanks.
-- next part --
A non-text attachment was scrubbed...
Name: zfs-fuse-arc.diff
Type: text/x-diff
Size: 2559 bytes
Desc: not available
URL: 



[zfs-code] Permission/ACL issues

2007-01-22 Thread Ricardo Correia
On Sunday 21 January 2007 21:17, Mark Shellenbaum wrote:
> ftruncate/truncate on Solaris are implemented via the F_FREESP command
> in fcntl(2). This ultimately calls VOP_SPACE() which doesn't need to do
> any access checks since the file was already opened with the necessary
> "write" permission.

Hi Mark :)

That would explain it. For some reason, I was convinced ftruncate() in 
userspace would lead to a VOP_SETATTR() in the kernel.

That also explains why I wasn't getting DTrace to trace zfs_setattr(), which 
was starting to get a little frustrating :)

Thanks!



[zfs-code] Permission/ACL issues

2007-01-21 Thread Ricardo Correia
Hi,

I'm having a problem with a simple sanity check performed by 'iozone -a'.

Basically, iozone create a file with a permission value of 0 and then it tries 
to truncate it:

1) fd = open("file", O_WRONLY|O_CREAT, 0)
2) ftruncate(fd, 0)

In zfs-fuse, the ftruncate() call ends up calling zfs_setattr() with AT_SIZE 
set in the attribute mask.

The problem is that one of the first things zfs_setattr() does is validate the 
permissions (by calling zfs_zaccess()), which fails since the file owner 
doesn't have write permission.

What am I doing wrong here? iozone seems to work in Solaris.

Thanks.



[zfs-code] Possible bug

2007-01-18 Thread Ricardo Correia
Oops, I missed the parenthesis.

Sorry, please disregard this post.



[zfs-code] Possible bug

2007-01-18 Thread Ricardo Correia
Hi, please take a look at this piece of code in zfs_vfsops.c:

int
zfs_mount(vfs_t *vfsp, vnode_t *mvp, struct mounta *uap, cred_t *cr)
{
(..)
int canwrite;
(...)
/*
 * Refuse to mount a filesystem if we are in a local zone and the
 * dataset is not visible.
 */
if (!INGLOBALZONE(curproc) &&
(!zone_dataset_visible(osname, &canwrite) || !canwrite)) {
error = EPERM;
goto out;
}
(..)
}

This piece of code has been flagged by the Intel C compiler and I think 
there's a bug here, if I'm not mistaken.

Assuming we are in the global zone, an optimizing compiler would never call 
zone_dataset_visible() since the expression "!INGLOBALZONE(curproc)" would 
evaluate to false. But afterwards, we are checking the value of the variable 
canwrite which hasn't been assigned yet.

Is my analysis correct?

Thanks.



[zfs-code] Request for help and advice on cache behaviour

2007-01-10 Thread Ricardo Correia
Hi,

I'm not sure how to control the ARC on the ZFS port to FUSE.

In the alpha1 release, for testing, I simply set the zfs_arc_max and 
zfs_arc_min variables to 80 MBs and 64 MBs (respectively) to prevent the ARC 
from growing unboundedly.

However, I'm having a problem. A simple run of the following script will cause 
zfs-fuse memory usage to grow almost indefinitely:

for i in `seq 1 10`;
do
 touch /pool/testdir/$i
done

The problem seems to be that vnodes are getting allocated and never freed.


[zfs-code] How does zfs mount at boot? How to let the system not to mount zfs?

2006-12-14 Thread Ricardo Correia
On Thursday 14 December 2006 20:35, No?l Dellofano wrote:
> > Where does Solaris find zfs mount at boot?
>
> ZFS keeps its record of what pools are on the system in /etc/zfs/
> zpool.cache.  That's the file that gets read at boot

I'm also interested to know where in the code are the filesystems mounted at 
boot.

Is there some code that does it when the zfs module loads or does OpenSolaris 
simply call 'zfs mount -a' as part of the boot process?



[zfs-code] Porting ZFS, trouble with nvpair

2006-11-17 Thread Ricardo Correia
Hi,

On Thursday 16 November 2006 19:19, Rick Mann wrote:
> I've noticed there are two nvpair_alloc_system.c files. One, in
> libnvpair, is dated 2004. The other, in nvpair, is dated 2006 and has
> more code in it.
>
> Which one is the right one?

If you search for nvpair_alloc_system.c in OpenGrok, there will be 3 results 
under /onnv/onnv-gate/:

/onnv/onnv-gate/usr/src/lib/libnvpair/ - libnvpair - name-value pair library
/onnv/onnv-gate/usr/src/stand/lib/nvpair/ - Stand-alone (booting) code
/onnv/onnv-gate/usr/src/uts/common/os/ - Core Operating System

The description OpenGrok gives you after the directory should give you some 
hints as to the version you should use.

So, since you are porting the libnvpair userspace library, you should use the 
file in /usr/src/lib/libnvpair.

You will notice that OpenSolaris has more or less the following structure:

/usr/src/lib: Userspace libraries.
/usr/src/cmd: Userspace programs.
/usr/src/uts: Kernel code.
/usr/src/common: Code which is common to userspace and kernel.

In case you'll still have some doubts, I have put my OpenSolaris file copy 
script at http://www.wizy.org/files/copysolaris.sh

If you take a look at that script, you'll see all the files that you might 
need in porting ZFS (except the kernel part, which I haven't finished porting 
yet).

Have fun ;)



[zfs-code] Porting ZFS, trouble with nvpair

2006-11-16 Thread Ricardo Correia
Hi,

That file is in /on/trunk/usr/src/uts/common/sys/nvpair.h

Check out OpenGrok at http://src.opensolaris.org

If you put sys/nvpair.h in the "File Path" field, it will tell you where you 
can find the file (the correct version is in the /onnv/onnv-gate/ subtree).

That site is immensely useful, I use it all the time when porting ZFS to FUSE. 
You should also check out 
http://www.opensolaris.org/os/community/zfs/porting/ if you haven't seen it 
yet.

Keep us posted with your progress :)

On Thursday 16 November 2006 01:24, Rick Mann wrote:
> Hi. I thought I'd take a stab at the first steps of porting ZFS to Darwin.
> I realize there are rumors that Apple is already doing this, but my contact
> at Apple has yet to get back to me to verify this. In the meantime, I
> wanted to see how hard it would be. I started with libzfs, and promptly ran
> into issues with libnvpair.
>
> It wants sys/nvpair.h, but I can't find that in the
> http://svn.genunix.org/repos/on/trunk/ tree.
>
> Could someone please point me in the right direction? Thanks!
> --
> This messages posted from opensolaris.org
> ___
> zfs-code mailing list
> zfs-code at opensolaris.org
> http://opensolaris.org/mailman/listinfo/zfs-code



[zfs-code] ZFS delete thread

2006-09-24 Thread Ricardo Correia
Hi,

On Sunday 24 September 2006 19:53, Mark Shellenbaum wrote:
> You can just ignore those for the linux port.  They are part of the
> Solaris power management framework.
>

Ok thanks.



[zfs-code] ZFS delete thread

2006-09-24 Thread Ricardo Correia
Hi,

I was wondering if you could tell me what's the purpose of the CALLB_*() 
macros in the ZFS delete thread (see here: 
http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/fs/zfs/zfs_dir.c#446
 ).

I've looked at sys/callb.h and callb.c, and the comments say it's an event 
scheduling/echoing mechanism, but I can't figure out how it works.
I even tried looking for it in the Solaris Internals 2nd Edition book, but it 
doesn't mention this mechanism.

Is it still used (or will it be used in the future) in that specific thread?

Thanks.



[zfs-code] ZFS patches

2006-09-16 Thread Ricardo Correia
On Saturday 16 September 2006 20:53, Ricardo Correia wrote:
> On Friday 15 September 2006 20:25, you wrote:
> > I wondered if you could send me a tarball of your changed files for
> > your patches below and also let me know what OpenSolaris build the
> > files are based off of?  It will speed up the port of your patches
> > immensely :)

Upon reading your mail again.. did you want the whole changed files? If so, 
you can get the new tarball with everything here: 
http://www.wizy.org/files/zfs-patches/zfsfuse-patches+code.tar.gz

Thanks.



[zfs-code] ZFS patches

2006-09-16 Thread Ricardo Correia
On Friday 15 September 2006 20:25, you wrote:
> I wondered if you could send me a tarball of your changed files for
> your patches below and also let me know what OpenSolaris build the
> files are based off of?  It will speed up the port of your patches
> immensely :)

I've attached a tarball with the patches updated to OpenSolaris build 48 
(mercurial tag onnv_48+putbacks). It's also available at 
http://www.wizy.org/files/zfs-patches/zfsfuse-patches.tar.gz

There's a notes.txt inside describing the patches. There's also the 
copysolaris.sh script that you might find useful in case you can't find out 
what files the patches apply to, since my directory structure is a little 
different.

The line numbers might also be a little different, since I've got some 
additional code in some files.

Let me know if you need any additional help or if I've made a mistake.

Thanks :)
-- next part --
A non-text attachment was scrubbed...
Name: zfsfuse-patches.tar.gz
Type: application/x-tgz
Size: 11105 bytes
Desc: not available
URL: 



[zfs-code] ZFS patches

2006-09-09 Thread Ricardo Correia
Here's one more:

---
http://www.wizy.org/files/zfs-patches/15-printf-gcc41.patch

This fixes 2 printf warnings emitted by gcc 4.1 (the first one was about a 
missing argument that remained undetected in Solaris and in gcc < 4.1 because 
of gettext()).



[zfs-code] ZFS patches

2006-08-26 Thread Ricardo Correia
Since my current code is already pretty stable, I am now releasing the patches 
that I had to apply to the original sources (relevant to OpenSolaris).

The current "linux.patch" file weights at about 63KB, this would reduce it to 
roughly 50% its size (the rest is mostly just commenting code for 
unimplemented features or some minor Linux-specific changes).

So, here are the patches:

---
http://www.wizy.org/files/zfs-patches/01-printf-casts.patch

gcc warns about printf() argument type mismatches. I had 2 choices - either 
use a -W.. flag to disable warnings or typecast the arguments. I opted for 
typecasting the arguments. I think its safer since I can't guarantee that 
there aren't size mismatches between some Linux and Solaris types (what would 
happen if there was a "%llu" format string and it was provided a 32-bit 
variable?).

---
http://www.wizy.org/files/zfs-patches/02-no-trigraph.patch

Remove 1 trigraph. Standard C99 has trigraphs and AFAIK there's no way to 
disable it in gcc when the -std=c99 flag is given. -Wno-trigraphs only 
disables the warning, if I'm not mistaken.

---
http://www.wizy.org/files/zfs-patches/03-char-ambiguity.patch

gcc complained about signed/unsigned char ambiguity.

---
http://www.wizy.org/files/zfs-patches/04-unused-code.patch

Unused/dead code.

---
http://www.wizy.org/files/zfs-patches/05-boolean-argument.patch

Better pointer convertion (removes gcc warning/error).

---
http://www.wizy.org/files/zfs-patches/06-zpool-no-memory.patch

The no_memory() function in zpool was conflicting with the no_memory() 
function in libzfs (currently I'm still statically compiling it).

---
http://www.wizy.org/files/zfs-patches/07-ztest-child.patch

POSIX waitpid() doesn't have the WEXITED flag.

---
http://www.wizy.org/files/zfs-patches/08-int-pointers.patch

Integer/pointer NULL mismatches...

---
http://www.wizy.org/files/zfs-patches/09-missing-enums.patch

You should have a good look at this one. I think this gcc warning is very 
helpful, so if some enums aren't really necessary, please add the "default: 
break" case.

---
http://www.wizy.org/files/zfs-patches/10-macro-expansion.patch

The first hunk was giving a warning in gcc. The second one was giving an 
error.

---
http://www.wizy.org/files/zfs-patches/11-arc-userland-bug.patch

This solves bug #6453172.

---
http://www.wizy.org/files/zfs-patches/12-catch-timer-overflow.patch

An assert that was helpful in debugging timer overflows.

---
http://www.wizy.org/files/zfs-patches/13-check-error.patch

Checks for errors in mutexes/condvars.

---
http://www.wizy.org/files/zfs-patches/14-brace-syntax.patch

Use the correct C99 syntax for nested structs.


That's it :)



[zfs-code] ztest_vdev_attach_detach still failing

2006-08-12 Thread Ricardo Correia
Hi,

I've received another bug report related to ztest.
It failed with the following message:

ztest: attach (/tmp/ztest.12a, /tmp/ztest.12b, 1) returned 0, expected 16

(16 == EBUSY)

There was another problem in that function which I've reported before - it 
returned ENOTSUPP instead of the expected EBUSY - which I think it isn't 
fixed yet.

I don't believe this is a serious problem, just following up the bug report ;)



[zfs-code] Bug in arc.c revision 1.15

2006-08-09 Thread Ricardo Correia
Hi,

I've received a bug report in zfs-fuse that doesn't seem to be specific to the 
port.
The problem was introduced in arc.c revision 1.15. There's a new static 
variable arc_min_prefetch_lifespan that initially is 1 second but it is 
converted to clock ticks in arc_init():

/* Convert seconds to clock ticks */
arc_min_prefetch_lifespan *= hz;

However, arc_init() is called more than once in ztest, so 
arc_min_prefetch_lifespan keeps getting multiplied by hz (119).

This results in an almost infinite loop in arc_flush(), since arc_evict() 
skips prefetch buffers with a lifespan less than (in my particular test-case) 
119*119*119 seconds..



[zfs-code] [patch] namecheck fixes

2006-06-20 Thread Ricardo Correia
Ok, that makes sense. Thanks.
By the way, Eric, were you able to reproduce the bug I mentioned earlier?

It has proven itself to be very difficult to trigger.
I received several reports of successfull 24-hour ztest runs, only 1 user has 
been affected.

I've changed the ztest attach/detach frequency to "always", and was able to 
reproduce it once in 10 minutes. I think I was lucky, because I tried to 
reproduce it again and even after 10 hours I wasn't able to do it..

I have the core file available, but I haven't analyzed the cause yet.

On Tuesday 20 June 2006 21:16, Eric Schrock wrote:
> Actually, this was intentionally not put in zfs_namecheck() due to
> backwards compatibility issues.  If you already had a pool named
> 'raidz_something', then if we put the check in zfs_namecheck(), you
> wouldn't be able to do _anything_ to the pool.  See the comment in
> zpool_name_valid():
>
> /*
>* The rules for reserved pool names were extended at a later point.
>  * But we need to support users with existing pools that may now be
>  * invalid.  So we only check for this expanded set of names during
> a * create (or import), and only in userland.
>  */
>
> - Eric
>
> --
> Eric Schrock, Solaris Kernel Development  
> http://blogs.sun.com/eschrock



[zfs-code] [patch] namecheck fixes

2006-06-20 Thread Ricardo Correia
If I'm not mistaken, there are some new reserved pool names (raidz1, raidz2 
and spare) that weren't being properly checked for (in zfs_namecheck.c).

There were also some errors that weren't being handled in libzfs_dataset.c 
(they were detected by gcc's warnings).
-- next part --
A non-text attachment was scrubbed...
Name: namecheck.diff
Type: text/x-diff
Size: 1721 bytes
Desc: not available
URL: 



[zfs-code] ztest failing in ztest_vdev_attach_detach() (again)

2006-06-19 Thread Ricardo Correia
Hi,

I've received a bug report of ztest failing in the exact same place as before, 
except now it's failing with ENOTSUPP (errno 95) instead of EBUSY (errno 16). 
It seems related to the Hot Spare work, just like before.

Debugging output is in the forwarded message below.

zfs-fuse-0.1.3 is using spa.c revision 1.15 and ztest.c revision 1.14 (line 
numbers may be slightly different, though).

By the way, is it ok to post these kind of problems here, or is it better to 
use the bug database? What if I'm not sure if it's really a ZFS bug, as 
opposed to a bug in the Linux port?

--  Forwarded Message  --

Subject: Re: ZFS-On-FUSE SMP Testing
Date: Monday 19 June 2006 15:16
From: Unit3 
To: Ricardo Correia 

Ricardo Correia wrote:
> Version 0.1.3 is released:
> http://developer.berlios.de/project/showfiles.php?group_id=6836

Hmmm... longer test of 0.1.3 over the weekend didn't go as well, here's
the details:


Pass 123,  SIGKILL,   0 ENOSPC, 57.3% of  730M used,  22% done,
18h42m28s to go
Pass 124,  SIGKILL,   0 ENOSPC, 57.2% of  730M used,  22% done,
18h41m34s to go
ztest: attach (/tmp/ztest.22a, /tmp/ztest.22b, 0) returned 95, expected 16
child died with signal 6

$ gdb ./ztest --core core.*
GNU gdb 6.4-debian
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...Using host libthread_db
library "/lib/libthread_db.so.1".

Core was generated by `./ztest -V -T 86400'.
Program terminated with signal 6, Aborted.
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
#0  0x2b09611d in raise () from /lib/libc.so.6
(gdb) bt
#0  0x2b09611d in raise () from /lib/libc.so.6
#1  0x2b09784e in abort () from /lib/libc.so.6
#2  0x00403027 in fatal (do_perror=0,
message=0x4810d8 "attach (%s, %s, %d) returned %d, expected %d")
at cmd/ztest/ztest.c:286
#3  0x00404ba3 in ztest_vdev_attach_detach (za=0x2d24d748)
at cmd/ztest/ztest.c:1000
#4  0x0040c197 in ztest_thread (arg=0x2d24d748)
at cmd/ztest/ztest.c:2938
#5  0x2af570fa in start_thread () from /lib/libpthread.so.0
#6  0x2b12ece2 in clone () from /lib/libc.so.6
#7  0x in ?? ()
(gdb) bt full
#0  0x2b09611d in raise () from /lib/libc.so.6
No symbol table info available.
#1  0x2b09784e in abort () from /lib/libc.so.6
No symbol table info available.
#2  0x00403027 in fatal (do_perror=0,
message=0x4810d8 "attach (%s, %s, %d) returned %d, expected %d")
at cmd/ztest/ztest.c:286
args = {{gp_offset = 48, fp_offset = 48,
overflow_arg_area = 0x2aaab0e03068, reg_save_area = 0x2aaab0e02fa0}}
save_errno = 0
buf = "ztest: attach (/tmp/ztest.22a, /tmp/ztest.22b, 0)
returned 95, expected 16\000?\000\000\000\000\001", '\0' ,
"0,?*\000\000\205?G\000\000\000\000\000pp???*\000\000\221\001H\001\001\000\00
0\000?Q???*\000\000p?-??*\000\000?\022q??*\000\000\020*\000\000\000\000\0
00\000`Y?*\000\000
 -?*\000\000?D\000\000\000\000\000\001\000\000\000\001", '\0' ...
#3  0x00404ba3 in ztest_vdev_attach_detach (za=0x2d24d748)
at cmd/ztest/ztest.c:1000
spa = (spa_t *) 0x2b703c40
rvd = (vdev_t *) 0x2b70ff80
oldvd = (vdev_t *) 0x2b71e730
newvd = (vdev_t *) 0x2b71e0c0
pvd = (vdev_t *) 0x2b71da70
---Type  to continue, or q  to quit---
root = (nvlist_t *) 0x2aaab3db0680
file = (nvlist_t *) 0x2aaab3d888e0
leaves = 8
leaf = 6
top = 2
ashift = 9
oldsize = 67108864
newsize = 61008058
oldpath = "/tmp/ztest.22a", '\0' ,
"ztest/ztest_0", '\0' ,
"`P\220??*\000\000p\003??*\000\000`??\000\000\000\000\000\000\000\000\0002\00
0\000\000\003", '\0' ,
"\004\000\000\000\000\006\000\000\000\000\200\001\000\000\000\000\000\000\000
?\025\000\000\000\000\000\000\bD???*\000\\000\000\000\000\000\000\",
 '\0' , "\035\000\000\000\000

[zfs-code] ztest not resilvering with 20060605 sources?

2006-06-10 Thread Ricardo Correia
Nevermind, I think the resilvering is working.. (otherwise I wouldn't have 
files ending in b).

But it sure is slow.. is this normal?



[zfs-code] ztest not resilvering with 20060605 sources?

2006-06-10 Thread Ricardo Correia
Hi again,

I think I found another problem in the current (20060605) sources. I was 
running ztest for about 2 hours and I started checking the pool layout 
with 'zdb -U -s ztest' at the same time. I noticed that none of the replacing 
vdevs were disappearing, even after an hour.

By the way, I believe ztest was running with the default 'vdev_attach_detach = 
rarely'.

Could you try to reproduce this in OpenSolaris? My Nexenta system seems to be 
working correctly, but it's using an old build (OpenSolaris Build #36).

To be sure it wasn't a Linux port specific bug, I reverted the source back to 
20060519 and now it seems to be working correctly. I haven't tried to analyze 
the problem.

Here's an output of the misbehaving test, after about 3 hours of ztest (with 
lots of replacing vdevs):

wizeman at wizy ~ $ /usr/sbin/zdb -U -s ztest
  capacity   operations   bandwidth   errors 
description  used avail  read write  read write  read write cksum
ztest249M  227M29 0 50.5K 0 0 0 0
  mirror 125M  113M 8 0 11.5K 0 0 0 0
raidz1  3 0 6.50K 0 0 0 0
  replacing 1 01K 0 0 0 0
/tmp/ztest.0a/old   0 0 0 0 0 0 0
/tmp/ztest.0a 514 0  625K 0 0 0 0
  /tmp/ztest.1a   258 0  369K 0 0 0 0
  replacing 3 0 2.50K 0 0 0 0
/tmp/ztest.2b 516 0  627K 0 0 0 0
/tmp/ztest.2a 325 0  436K 0 0 0 0
  /tmp/ztest.3a   335 0  446K 0 0 0 0
raidz1  5 0 5.00K 0 0 0 0
  replacing 5 0 3.00K 0 0 0 0
/tmp/ztest.4b 518 0  627K 0 0 0 0
/tmp/ztest.4a 513 0  624K 0 0 0 0
  replacing 0 0 0 0 0 0 0
/tmp/ztest.5a 289 0  400K 0 0 0 0
/tmp/ztest.5b 513 0  624K 0 0 0 0
  /tmp/ztest.6a   503 0  614K 0 0 0 0
  /tmp/ztest.7b   514 0  625K 0 0 0 0
  mirror 125M  113M21 0 39.0K 0 0 0 0
raidz1  9 0 14.0K 0 0 0 0
  replacing 1 01K 0 0 0 0
/tmp/ztest.8b 514 0  625K 0 0 0 0
/tmp/ztest.8a 300 0  411K 0 0 0 0
  replacing 7 0 5.50K 0 0 0 0
/tmp/ztest.9a 408 0  518K 0 0 0 0
/tmp/ztest.9b 513 0  624K 0 0 0 0
  /tmp/ztest.10b  517 0  628K 0 0 0 0
  /tmp/ztest.11a  320 0  430K 0 0 0 0
raidz1 12 0 25.0K 0 0 0 0
  /tmp/ztest.12b  518 0  630K 0 0 0 0
  /tmp/ztest.13b  519 0  629K 0 0 0 0
  /tmp/ztest.14b  521 0  631K 0 0 0 0
  replacing 9 08K 0 0 0 0
/tmp/ztest.15b522 0  632K 0 0 0 0
/tmp/ztest.15a497 0  608K 0 0 0 0



[zfs-code] ztest failing in ztest_vdev_attach_detach()

2006-06-09 Thread Ricardo Correia
Ok, the changes are available here: http://www.wizy.org/files/zfs-linux.patch

A few notes:

1) The patch includes the header Solaris/Linux differences. I'm not sure 
what's relevant, if anything. I have included some Solaris-specific macros 
and typedefs in sol_compat.h, so you will see some #includes removed.

By the way, I'm not exactly a C or UNIX expert (not even close), so if there 
are any glaring errors in the #includes, please let me know :p

2) I don't know if I got the necessary casts correctly. I had to do it to 
eliminate gcc warnings, I don't know how it'll work with the Sun Studio 
compiler. I have tested them with gcc-3.4.5 in both x86 and x86-64 modes.

3) gcc spits a warning in this code: 'struct foo bar = { 0 }'.. I had to 
change all such code to 'struct foo bar = {}'. The {0} syntax is correct 
standard C, however.

4) The directory structure is not the same as in the OpenSolaris sources, 
sorry about that.

On Friday 09 June 2006 18:50, Eric Schrock wrote:
> Sweet!  Let us know if you had to make any changes to non-zfs_context
> code, and we can integrate the changes upstream to make future
> maintenance/other ports easier.  FYI, the fix I'll putback is slightly
> ...



[zfs-code] ztest failing in ztest_vdev_attach_detach()

2006-06-09 Thread Ricardo Correia
Ok, I'm glad to know my analysis was correct (I wasn't sure about it, though).

This means that the ztest port to Linux now doesn't have any known bugs ;)

Thank you for the fast response.

On Friday 09 June 2006 18:26, you wrote:
> This happens if you have:
>
>   replacing
>  foo
>  bar
>
> And you try to replace 'foo' with 'bar.  Sure enough, the old code used
> to do:
>
>   if (parent is replacing)
>   return ENOTSUP
>   if (initialize new vdev failed)
>   return failure (EBUSY)
>
> Now it switched the order:
>
>   if (initialize new vdev failed)
>   return failure (EBUSY)
>   if (parent is replacing)
>   return ENOTSUP
>
> Both cases are correct, its just a matter of which error condition gets
> triggered first.  You cannot trigger this through the CLI (since it
> refuses to issue the ioctl because said disk is actively in use).  I'll
> think some more about it, but most likely the correct fix is the one
> below.
>
> - Eric
>
> On Fri, Jun 09, 2006 at 10:12:51AM -0700, Eric Schrock wrote:
> > Yes, I'm seeing this unexpectedly fail with EBUSY  (errno 16) instead of
> > ENOTSUP.  This was definitely caused by the hot spare work, and it's my
> > fault for not running ztest more rigorously.  I'll file a bug and take a
> > look.  For now, you can change (in ztest.c):
> >
> > if (error == EOVERFLOW)
> > expected_error = error;
> >
> > To:
> >
> > if (error == EOVERFLOW || error == EBUSY)
> > expected_error = error;
> >
> > I've been able to run this way for 10 minutes, with the 'rarely ->
> > always' change.
> >
> > Sorry about that,
> >
> > - Eric
>
> --
> Eric Schrock, Solaris Kernel Development  
> http://blogs.sun.com/eschrock



[zfs-code] ztest failing in ztest_vdev_attach_detach()

2006-06-09 Thread Ricardo Correia
Hi,

I'm wondering if you could shed some light on an issue I'm having in porting 
libzpool to Linux.

It seems that ztest is failing in ztest_vdev_attach_detach(). 

I think you will be able to reproduce it in OpenSolaris. I think the problem 
is related to the changes done in spa.c from revision 1.14 to revision 1.15 
(changed in 26-May-2006).

Could you please try running ztest for a few minutes with the latest sources 
(>26-May-2006) ?

You will find the problem a lot faster if you change line 194 in ztest.c

from: { ztest_vdev_attach_detach,&zopt_rarely},
to:   { ztest_vdev_attach_detach,&zopt_always},

I think about 30-45 minutes should be enough to find the problem (usually it's 
faster).

Thank you.



[zfs-code] Read-write locks in libzpool

2006-06-09 Thread Ricardo Correia
Ok, I took a look at the libc and the kernel implementation of rwlocks.

I'm a little worried if I got this right, because most of the ZFS code can run 
both in userspace and in the kernel, which seem to behave differently in the 
RW_xxx_HELD() macros, and I need to port it correctly.

This is what I understood, please correct me if I got something wrong:

1) Like you said, libc keeps track of which reader locks are held by a given 
thread
2) The kernel only keeps track of either:
2.1) The thread that has the rwlock locked for writing
2.2) Or the number of readers that have the rwlock locked for reading

3) The RW_READ_HELD() macro:
3.1) In userspace, checks if the current thread holds the rwlock locked 
for reading
3.2) In the kernel, only checks if the rwlock is locked for reading (by 
any thread)

4) The RW_LOCK_HELD() macro:
4.1) In userspace, is equivalent to RW_READ_HELD() || RW_WRITE_HELD(), so 
basically it checks if the current thread has the rwlock locked.
4.2) In the kernel, it only checks if the rwlock is locked (!!)

Is this correct?

Thank you for helping.

On another note, I already got zdb to compile in Linux. Cool, heh? ;)

On Tuesday 06 June 2006 19:17, Jonathan Adams wrote:
> Our userland threads library keeps track, of which reader locks are held by
> a given thread.  If you look at the implementation of _rw_read_held():
>
> http://cvs.opensolaris.org/source/xref/on/usr/src/lib/libc/port/threads/rwl
>ock.c#_rw_read_held
>
> it checks the current thread's list of locks to verify that it is not being
> held.



[zfs-code] Read-write locks in libzpool

2006-06-02 Thread Ricardo Correia
On Friday 02 June 2006 05:12, Neil Perrin wrote:
> I believe RW_LOCK_HELD checks it's not held by the calling thread only.
> Note, a thread should not doubly read lock the same lock as
> a write lock from another thread between the 2 would deadlock.

Ok, that makes sense. Thanks :)

I'm assuming RW_WRITE_HELD() and MUTEX_HELD() also only checks if it's held by 
the calling thread (I couldn't find any documentation on this, and the 
implementation of the RW_xxx_HELD() macros weren't very easy to 
understand :p)

Oh well... now I only have to figure out a way to emulate this with POSIX 
threads.. :)



[zfs-code] Read-write locks in libzpool

2006-06-02 Thread Ricardo Correia
Hi,

I think I found a bug in the rw_enter() implementation (emulation?) in 
libzpool, file /usr/src/lib/libzpool/common/kernel.c:

void
rw_enter(krwlock_t *rwlp, krw_t rw)
{
ASSERT(!RW_LOCK_HELD(rwlp));
ASSERT(rwlp->rw_owner != (void *)-1UL);
ASSERT(rwlp->rw_owner != curthread);

if (rw == RW_READER)
(void) rw_rdlock(&rwlp->rw_lock);
else
(void) rw_wrlock(&rwlp->rw_lock);

rwlp->rw_owner = curthread;
}

Doesn't RW_LOCK_HELD() check if there's a reader or a writer locked? If it 
does, then these read-write locks would produce an assertion when multiple 
readers tried to lock it.

However, RW_LOCK_HELD() is being applied to "rwlp" instead 
of "&rwlp->rw_lock", which could explain why it's not failing.

Am I understanding this correctly?

Unfortunately POSIX threads don't have an equivalent of rw_lock_held(), 
rw_write_held(), mutex_held(), ..., so I really have to understand this in 
order to somehow emulate their behavior.

Thanks!