Re: [zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.

2008-02-20 Thread Roch Bourbonnais

Le 20 févr. 08 à 23:03, Robert Milkowski a écrit :

> Hello Roch,
>
> Friday, February 15, 2008, 10:51:50 AM, you wrote:
>
> RB> Le 10 févr. 08 à 12:51, Robert Milkowski a écrit :
>
>>> Hello Nathan,
>>>
>>> Thursday, February 7, 2008, 6:54:39 AM, you wrote:
>>>
>>> NK> For kicks, I disabled the ZIL: zil_disable/W0t1, and that made
>>> not a
>>> NK> pinch of difference. :)
>>>
>>> Have you exported and them imported pool to get zil_disable into
>>> effect?
>>>
>>>
>
> RB> I don't think export/import is required.
>
> You sure? I'm sure it's been required unless something changed.
>


Sorry, my mistake, I was thinking of zfs_nocacheflush.

You're absolutely right.

-r


nits : pool export/import will see the modified zil_disable value but  
an unmount/mount cycle
of an individual filesystem is sufficient to get the effect.

> -- 
> Best regards,
> Robert Milkowski   mailto:[EMAIL PROTECTED]
>   http://milek.blogspot.com
>

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.

2008-02-20 Thread Robert Milkowski
Hello Roch,

Friday, February 15, 2008, 10:51:50 AM, you wrote:

RB> Le 10 févr. 08 à 12:51, Robert Milkowski a écrit :

>> Hello Nathan,
>>
>> Thursday, February 7, 2008, 6:54:39 AM, you wrote:
>>
>> NK> For kicks, I disabled the ZIL: zil_disable/W0t1, and that made  
>> not a
>> NK> pinch of difference. :)
>>
>> Have you exported and them imported pool to get zil_disable into
>> effect?
>>
>>

RB> I don't think export/import is required.

You sure? I'm sure it's been required unless something changed.

-- 
Best regards,
 Robert Milkowski   mailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.

2008-02-15 Thread Roch Bourbonnais

Le 10 févr. 08 à 12:51, Robert Milkowski a écrit :

> Hello Nathan,
>
> Thursday, February 7, 2008, 6:54:39 AM, you wrote:
>
> NK> For kicks, I disabled the ZIL: zil_disable/W0t1, and that made  
> not a
> NK> pinch of difference. :)
>
> Have you exported and them imported pool to get zil_disable into
> effect?
>
>

I don't think export/import is required.

-r


> -- 
> Best regards,
> Robert Milkowskimailto:[EMAIL PROTECTED]
>   http://milek.blogspot.com
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.

2008-02-10 Thread Robert Milkowski
Hello Nathan,

Thursday, February 7, 2008, 6:54:39 AM, you wrote:

NK> For kicks, I disabled the ZIL: zil_disable/W0t1, and that made not a 
NK> pinch of difference. :)

Have you exported and them imported pool to get zil_disable into
effect?


-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.

2008-02-06 Thread Nathan Kroenert
Hey all -

I'm working on an interesting issue where I'm seeing ZFS being quite 
cranky about writing O_SYNC written blocks.

Bottom line is that I have a small test case that does essentially this:

open file for writing  -- O_SYNC
loop(
write() 8KB of random data
print time taken to write data
}

It's taking anywhere up to 80 seconds per 8KB block. When the 'problem' 
is not in evidence, (and it's not always happening), I can do around 
1200 O_SYNC writes per second...

It seems to be waiting here virtually all of the time:

 > 0t11021::pid2proc | ::print proc_t p_tlist|::findstack -v
stack pointer for thread 30171352960: 2a118052df1
[ 02a118052df1 cv_wait+0x38() ]
   02a118052ea1 zil_commit+0x44(1, 6b50516, 193, 60005db66bc, 6b50570,
   60005db6640)
   02a118052f51 zfs_write+0x554(0, 14000, 2a1180539e8, 6000af22840, 
2000,
   2a1180539d8)
   02a118053071 fop_write+0x20(304898cd100, 2a1180539d8, 10, 
300a27a9e48, 0,
   7b7462d0)
   02a118053121 write+0x268(4, 8058, 60051a3d738, 2000, 113, 1)
   02a118053221 dtrace_systrace_syscall32+0xac(4, ffbfdaf0, 2000, 21e80,
   ff3a00c0, ff3a0100)
   02a1180532e1 syscall_trap32+0xcc(4, ffbfdaf0, 2000, 21e80, ff3a00c0,
   ff3a0100)

And this also evident in a dtrace of it, following the write in...

<...>
  28-> zil_commit
  28  -> cv_wait
  28-> thread_lock
  28<- thread_lock
  28-> cv_block
  28  -> ts_sleep
  28  <- ts_sleep
  28  -> new_mstate
  28-> cpu_update_pct
  28  -> cpu_grow
  28-> cpu_decay
  28  -> exp_x
  28  <- exp_x
  28<- cpu_decay
  28  <- cpu_grow
  28<- cpu_update_pct
  28  <- new_mstate
  28  -> disp_lock_enter_high
  28  <- disp_lock_enter_high
  28  -> disp_lock_exit_high
  28  <- disp_lock_exit_high
  28<- cv_block
  28-> sleepq_insert
  28<- sleepq_insert
  28-> disp_lock_exit_nopreempt
  28<- disp_lock_exit_nopreempt
  28-> swtch
  28  -> disp
  28-> disp_lock_enter
  28<- disp_lock_enter
  28-> disp_lock_exit
  28<- disp_lock_exit
  28-> disp_getwork
  28<- disp_getwork
  28-> restore_mstate
  28<- restore_mstate
  28  <- disp
  28  -> pg_cmt_load
  28  <- pg_cmt_load
  28<- swtch
  28-> resume
  28  -> savectx
  28-> schedctl_save
  28<- schedctl_save
  28  <- savectx
<...>

At this point, it waits for up to 80 seconds.

I'm also seeing zil_commit() being called around 7-15 times per second.

For kicks, I disabled the ZIL: zil_disable/W0t1, and that made not a 
pinch of difference. :)

For what it's worth, this is a T2000, running Oracle, connected to an 
HDS 9990 (using 2GB fibre), with 8KB record sizes for the oracle 
filesystems, and I'm only seeing the issue on the ZFS filesystems that 
have the active oracle tables on them.

The O_SYNC test case is just trying to help me understand what's 
happening. The *real* problem is that oracle is running like rubbish 
when it's trying to roll forward archive logs from another server. It's 
an almost 100% write workload. At the moment, it cannot even keep up 
with the other server's log creation rate, and it's barely doing 
anything. (The other box is quite different, so not really valid for 
direct comparison at this point).

6513020 looked interesting for a while, but I already have 120011-14 and 
127111-03 and installed.

I'm looking into the cache flush settings of the 9990 array to see if 
it's that killing me, but I'm also looking for any other ideas on what 
might be hurting me.

I also have set
zfs:zfs_nocacheflush = 1
in /etc/system

The Oracle Logs are on a separate Zpool and I'm not seeing the issue on 
those filesystems.

The lockstats I have run are not yet all that interesting. If anyone has 
ideas on specific incantations I should use or some specific D or 
anything else, I'd be most appreciative.

Cheers!

Nathan.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss