[zfs-code] ARC cache reference counting

2009-06-23 Thread Mark Maybee
Jeremy Archer wrote: > Thanks for the explanation. > >> a dirty >> buffer goes onto the >> list corresponding to the txg it belongs to. > > Ok. I see that all dirty buffers are put on a per txg list. > This is for easy synchronization, makes sense. > > The per dmu_buf_impl_t details

[zfs-code] ARC cache reference counting

2009-06-22 Thread Mark Maybee
Hi Jeremy, Jeremy Archer wrote: > Hello, > > I believe the following is true, correct me if it is not: > > If more than one objects reference a block (e.g. 2 files have the same block > open) > there must be multiple clones of the arc_buf_t ( and associated dmu_impl_t ) > records > present, o

[zfs-code] ARC cache reference counting

2009-06-02 Thread Mark Maybee
Jeremy Archer wrote: > I started to look at ref counting to convince myself that the db_bu field in > a cached dmu_impl_t object > is guaranteed to point at a valid arc_buf_t. > > I have seen a "deadbeef" crash on a busy system when zfs_write() is > pre-pagefaulting in > the file's pages. >

[zfs-code] ARC cache reference counting

2009-06-01 Thread Mark Maybee
Jeremy Archer wrote: > Greets, > > I have read a couple of earlier posts by Jeff and Mark Maybee explaining how > Arc reference counting works. > These posts did help clarifying this piece of code ( a bit complex, to say > the least). > I would like to solicit more com

[zfs-code] PSARC/2009/204 ZFS user/group quotas & space accounting

2009-04-21 Thread Mark Maybee
Pawel Jakub Dawidek wrote: > On Sat, Apr 18, 2009 at 06:05:56PM -0700, Matthew.Ahrens at sun.com wrote: >> Author: Matthew Ahrens >> Repository: /hg/onnv/onnv-gate >> Latest revision: f41cf682d0d3e3cf5c4ec17669b903ae621ef882 >> Total changesets: 1 >> Log message: >> PSARC/2009/204 ZFS user/group q

[zfs-code] 6551866 deadlock between zfs_write(), zfs_freesp(), and zfs_putapage()

2009-02-17 Thread Mark Maybee
Gack! Absolutely correct J?rgen. I have filed 6806627 to track this. -Mark J?rgen Keil wrote: > It seems there is a bug introduced by the putback for > > author: Mark Maybee > date: Wed Jan 28 11:04:37 2009 -0700 (2 weeks ago) > files:usr/src/uts/comm

[zfs-code] Disk Writes

2009-02-05 Thread Mark Maybee
Ben Rockwood wrote: > Mark Maybee wrote: >> Ben Rockwood wrote: >>> I need some help with clarification. >>> >>> My understanding is that there are 2 instances in which ZFS will write >>> to disk: >>> 1) TXG Sync >>> 2) ZIL >&g

[zfs-code] Disk Writes

2009-02-05 Thread Mark Maybee
Ben Rockwood wrote: > I need some help with clarification. > > My understanding is that there are 2 instances in which ZFS will write > to disk: > 1) TXG Sync > 2) ZIL > > Post-snv_87 a TXG should sync out when the TXG is either over filled or > hits the timeout of 30 seconds. > > First question

[zfs-code] Unexpected b_hdr change.

2008-08-16 Thread Mark Maybee
This is a known bug: 6732083 arc_read() panic: rw_exit: lock not held with a known cause. The fix you suggest works, but it rather ugly. We are working on a fix now. -Mark Pawel Jakub Dawidek wrote: > On Tue, Jul 29, 2008 at 12:41:16PM +0200, Pawel Jakub Dawidek wrote: >> Hi. >> >> We're test

[zfs-code] truncate(2) not working properly.

2008-08-16 Thread Mark Maybee
This fix for this bug is currently in test and will be pushed shortly. -Mark Pawel Jakub Dawidek wrote: > On Tue, Jul 22, 2008 at 08:56:31AM -0600, Mark Shellenbaum wrote: >> Pawel Jakub Dawidek wrote: >>> On Tue, Jul 22, 2008 at 04:28:45PM +0200, Pawel Jakub Dawidek wrote: Hi. I j

[zfs-code] Peak every 4-5 second

2008-07-22 Thread Mark Maybee
ZFS is designed to "sync" a transaction group about every 5 seconds under normal work loads. So your system looks to be operating as designed. Is there some specific reason why you need to reduce this interval? In general, this is a bad idea, as there is somewhat of a "fixed overhead" associated

[zfs-code] Can Read and Write Share Same arc_buf_hdr_t

2008-06-05 Thread Mark Maybee
J Duff wrote: > I'm trying to understand the arc code. > > Can a read zio and a write zio share the same arc_buf_hdr_t? > No. > If so, do they each have their own arc_buf_t both of which point back to the > same arc_buf_hdr_t? In other words, do they each have their own copy of the > data (arc

[zfs-code] Adding to arc_buf_hdr_t

2008-02-28 Thread Mark Maybee
The arc_buf_hdr_t is an in-core-only data structure, so it has no version impact. It does not matter where you add an extra field. However, only add a field to this structure if you really need it... increasing the size of this struct will likely have a negative performance impact on the ARC. -M

[zfs-code] Design for EAs in dnode.

2008-02-09 Thread Mark Maybee
Ricardo M. Correia wrote: > Hi Matthew, > > On Qui, 2008-02-07 at 12:48 -0800, Matthew Ahrens wrote: >> on disk: >> header: >> struct sa_phys { >> uint16_t sa_numattrs; >> struct { >> uint16_t sa_type; /* enum sssa_type */ >> uint16_t sa_leng

[zfs-code] Understanding the ARC - arc_buf_hdr and arc_buf_t

2008-01-16 Thread Mark Maybee
J Duff wrote: > I?m trying to understand the inner workings of the adaptive replacement cache > (arc). I see there are arc_bufs and arc_buf_hdrs. Each arc_buf_hdr points to > an arc_buf_t. The arc_buf_t is really one entry in a list of arc_buf_t > entries. The multiple entries are accessed throu

[zfs-code] R/W lock portability issue

2007-11-27 Thread Mark Maybee
Chris Kirby wrote: > Mark Maybee wrote: >> Chris Kirby wrote: >> >>> Matthew Ahrens wrote: >>> >>>> So, we use RW_LOCK_HELD() to mean, "might this thread hold the >>>> lock?" and it is generally only used in assertions. Eg,

[zfs-code] R/W lock portability issue

2007-11-27 Thread Mark Maybee
Chris Kirby wrote: > Matthew Ahrens wrote: >> So, we use RW_LOCK_HELD() to mean, "might this thread hold the lock?" and >> it >> is generally only used in assertions. Eg, some routine should only be >> called >> with the lock held, so we "ASSERT(RW_LOCK_HELD(lock))". The fact that >> someti

[zfs-code] Recent deadlock.

2007-11-07 Thread Mark Maybee
Pawel Jakub Dawidek wrote: > On Wed, Nov 07, 2007 at 07:41:54AM -0700, Mark Maybee wrote: >> Hmm, seems rather unlikely that these two IOs are related. Thread 1 >> is trying to read a dnode in order to extract the znode data from its >> bonus buffer. Thread 2 is completing a

[zfs-code] Recent deadlock.

2007-11-07 Thread Mark Maybee
2AM -0700, Mark Maybee wrote: >> Pawel, >> >> I'm not quite sure I understand why thread #1 below is stalled. Is >> there only a single thread available for IO completion? > > There are few, but I belive the thread #2 is trying to complete the very > I/O request o

[zfs-code] Recent deadlock.

2007-11-07 Thread Mark Maybee
Pawel, I'm not quite sure I understand why thread #1 below is stalled. Is there only a single thread available for IO completion? -Mark Pawel Jakub Dawidek wrote: > Hi. > > I'm observing the following deadlock. > > One thread holds zfsvfs->z_hold_mtx[i] lock and waits for I/O: > > Tracing pi

[zfs-code] Increasing dnode size

2007-09-14 Thread Mark Maybee
Andreas Dilger wrote: > On Sep 13, 2007 15:27 -0600, Mark Maybee wrote: >> We have explored the idea of increasing the dnode size in the past >> and discovered that a larger dnode size has a significant negative >> performance impact on the ZPL (at least with our current cach

[zfs-code] Increasing dnode size

2007-09-13 Thread Mark Maybee
Andreas, We have explored the idea of increasing the dnode size in the past and discovered that a larger dnode size has a significant negative performance impact on the ZPL (at least with our current caching and read-ahead policies). So we don't have any plans to increase its size generically any

[zfs-code] Setting zio->io_error during zio_write

2007-08-15 Thread Mark Maybee
Darren J Moffat wrote: > For an encrypted dataset it is possible that by the time we arrive in > zio_write() [ zio_write_encrypt() ] that when we lookup which key is > needed to encrypted this data that key isn't available to us. > > Is there some value of zio->io_error I can set that will not r

[zfs-code] IO error on mount for encrypted dataset

2007-08-14 Thread Mark Maybee
Darren J Moffat wrote: > Does the ARC get flushed for a dataset when it is unmounted ? Yes > What does change when a dataset is unmounted ? > Pretty much everything associated with the dataset should be evicted... its possible that some of the meta-data may hang around I suppose (I don't remembe

[zfs-code] ARC deadlock.

2007-08-09 Thread Mark Maybee
m the community :-). -Mark Pawel Jakub Dawidek wrote: > On Fri, May 18, 2007 at 08:22:26AM -0600, Mark Maybee wrote: >> Yup, J?rgen is correct. The problem here is that we are blocked in >> arc_data_buf_alloc() while holding a hash_lock. This is bug 6457639. >> One possibil

[zfs-code] Re: ARC deadlock.

2007-05-18 Thread Mark Maybee
Yup, J?rgen is correct. The problem here is that we are blocked in arc_data_buf_alloc() while holding a hash_lock. This is bug 6457639. One possibility, for this specific bug might be to drop the lock before the allocate and then redo the read lookup (in case there is a race) with the necessary b

[zfs-code] Refactor zfs_zget()

2007-05-09 Thread Mark Maybee
Ricardo Correia wrote: > On Wednesday 09 May 2007 04:57:53 Ricardo Correia wrote: >> 2) In the end of zfs_zget(), if the requested object number is not found, >> it allocates a new znode with that object number. This shouldn't happen in >> any FUSE operation. > > Apparently, I didn't (and I still

[zfs-code] mappedwrite().

2007-05-01 Thread Mark Maybee
Yes, its the same in Solaris. Its probably more correct to always to do the dmu_write() as this keeps the page and file in sync. -Mark Pawel Jakub Dawidek wrote: > Hi. > > I'm pondering this piece of mappedwrite(): > > if (pp = page_lookup(vp, start, SE_SHARED)) { > caddr_t

[zfs-code] Scrubbing a zpool built on LUNs

2007-04-27 Thread Mark Maybee
Mike, Please post this sort of query to zfs-discuss (rather than zfs-code). zfs-code is a development discussion forum. Without any form of replication that zfs knows about (RAIDZ or mirrors), there is no way for ZFS to fix up data errors detected in a scrub. RAID5 LUNs just look like normal devi

[zfs-code] Contents of transaction group?

2007-04-09 Thread Mark Maybee
Atul Vidwansa wrote: > Hi, >I have few questions about the way a transaction group is created. > > 1. Is it possible to group transactions related to multiple operations > in same group? For example, an "rmdir foo" followed by "mkdir bar", > can these end up in same transaction group? > Yes.

[zfs-code] Request for help and advice on cache behaviour

2007-01-10 Thread Mark Maybee
1, VOP_INACTIVE() is > called. > 2) VOP_INACTIVE() calls zfs_inactive() which calls zfs_zinactive(). > 3) zfs_zinactive() calls dmu_buf_rele() > 4) ?? > 5) znode_pageout_func() calls zfs_znode_free() which finally frees the vnode. > > As for step 4, Mark Maybee mentioned: >

[zfs-code] Small nits in zfs_fm.c.

2006-12-07 Thread Mark Maybee
Sorry Pawel, The team is rather slammed with work at the moment, so we may be a bit slow at getting to things like this. We certainly appreciate getting these patches though. -Mark Pawel Jakub Dawidek wrote: > On Mon, Nov 13, 2006 at 03:14:04PM +0100, Pawel Jakub Dawidek wrote: > >>The patch b

[zfs-code] Lock order reversal (harmless?).

2006-11-22 Thread Mark Maybee
Pawel Jakub Dawidek wrote: > I had another one, can you analize it? > > lock order reversal: > 1st 0xc44b9b00 zfs:dbuf (zfs:dbuf) @ > /zoo/pjd/zfstest/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/dbuf.c:1644 > 2nd 0xc45be898 zfs:dbufs (zfs:dbufs) @ > /zoo/pjd/zfstest/sys/modules

[zfs-code] Lock order reversal (harmless?).

2006-11-22 Thread Mark Maybee
Pawel Jakub Dawidek wrote: > Hi. > > FreeBSD's WITNESS mechanism for detecting lock order reversals reports > LOR here: > > lock order reversal: > 1st 0xc3f7738c zfs:dbuf (zfs:dbuf) @ > /zoo/pjd/zfstest/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c:410 > 2nd 0xc3fefc

[zfs-code] ZFS and memory usage.

2006-11-10 Thread Mark Maybee
Pawel Jakub Dawidek wrote: > On Fri, Nov 10, 2006 at 06:36:07AM -0700, Mark Maybee wrote: > >>Pawel Jakub Dawidek wrote: >> >>>On Tue, Nov 07, 2006 at 06:06:48PM -0700, Mark Maybee wrote: >>> >>>>The problem is that in ZFS the vnode holds onto more

[zfs-code] ZFS and memory usage.

2006-11-10 Thread Mark Maybee
Pawel Jakub Dawidek wrote: > On Tue, Nov 07, 2006 at 06:06:48PM -0700, Mark Maybee wrote: > >>The problem is that in ZFS the vnode holds onto more memory than just >>the vnode itself. Its fine to place the vnode on a "free vnodes list" >>after a VOP_INACTIVE(

[zfs-code] ZFS and memory usage.

2006-11-07 Thread Mark Maybee
Pawel Jakub Dawidek wrote: > ZFS works really stable on FreeBSD, but I'm biggest problem is how to > control ZFS memory usage. I've no idea how to leash that beast. > > FreeBSD has a backpresure mechanism. I can register my function so it > will be called when there are memory problems, which I do

[zfs-code] Current zfetch heuristic

2006-10-11 Thread Mark Maybee
Jeremy Teo wrote: > Heya, > > just a short blurb of what I understand from grokking dmu_zfetch.c > > Basically the current code issues a prefetch (ie. create a new > prefecth stream) whenever a block (level 0, DB_RF_NOPREFETCH is not > set) is is read in dbuf_read. > > Since ZFS is multi-threade

[zfs-code] Re: ASSERT failed dn->dn_nlevels > level (0x0 > 0x0) dbuf.c, line: 1523

2006-10-04 Thread Mark Maybee
pipeline rather than the async_stages. Its still not clear to me how this can result in your problems, but then I don't yet understand how the SPA io pipeline works in all circumstances. -Mark Mark Maybee wrote: > Darren, > > I looked a bit at your dumps... in both cases, the probl

[zfs-code] Re: ASSERT failed dn->dn_nlevels > level (0x0 > 0x0) dbuf.c, line: 1523

2006-09-29 Thread Mark Maybee
Darren, I looked a bit at your dumps... in both cases, the problem is that the os_phys block that we read from the disk is garbage: > 0x9377b000::print objset_phys_t { os_meta_dnode = { dn_type = 0 dn_indblkshift = 0 dn_nlevels = 0 dn_nblkptr = 0

[zfs-code] Kernel panic on "zpool import"

2006-09-14 Thread Mark Maybee
Daniel Rock wrote: > Hi, > > I just triggered a kernel panic while trying to import a zpool. > > The disk in the zpool was residing on a Symmetrix and mirrored with SRDF. The > host sees both devices though (one writeable device "R1" on one > Symmetrix box and one write protected device "R2" on

[zfs-code] Opening a snapshot.

2006-08-27 Thread Mark Maybee
Pawel Jakub Dawidek wrote: >Hi. > >I'm currently working on snapshots and can't understand one thing. > >When someone lookups a snapshot directory it gets automatically mounted >from zfsctl_snapdir_lookup() via domount(). > >Ok, domount() ends up in zfs_domount(). zfs_domount() wants to open >data

[zfs-code] Nested Transactions

2006-06-08 Thread Mark Maybee
No, nested transactions are not allowed. (your sense is correct :-)). -Mark Jeremy Teo wrote: > Are nested transactions allowed/supported by the DMU? > > Namely, if I have function Foo and function Bar that both wrap their > own operations using a transaction such that Foo and Bar are > atomic/

[zfs-code] zio_push/pop_transform how and when to use it.

2006-06-01 Thread Mark Maybee
Hi Darren, Sorry about the slow response (from me). I was on vacation last week (and am on semi-vacation this week). I can't answer your question about using the zio transform stuff. You will have to get Jeff or Bill's attention for that. As far as the ARC "hook" goes: it doesn't yet exist. Yo