Re: [zfs-discuss] Expected throughput

2010-07-05 Thread Roy Sigurd Karlsbakk








The database is MySQL, it runs on a Linux box that connects to the Nexenta 
server through 10GbE using iSCSI. Just a short question - wouldn't it be 
easier, and perhaps faster, to just have the MySQL DB on an NFS share? iSCSI 
adds complexity, both on the target and the initiator. 

Also, are you using jumbo frames? That can usually help a bit with either 
access protocol 

Vennlige hilsener / Best regards 

roy 
-- 
Roy Sigurd Karlsbakk 
(+47) 97542685 
r...@karlsbakk.net 
http://blogg.karlsbakk.net/ 
-- 
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Expected throughput

2010-07-05 Thread Ian D

Just a short question - wouldn't it be easier, and perhaps faster, to just 
have the MySQL DB on an NFS share? iSCSI adds 
complexity, both on the target and the initiator.


Yes, we did tried both and we didn't notice any difference in term of 
performances.  I've read conflicting opinions on which is best and the majority 
seems to say that iSCSI is better for databases, but I don't have any strong 
preference myself...  

Also, are you using jumbo frames? That can usually help a bit with either 
access protocol


Yes.  It was off early on and we did notice a significant difference once we 
switched it on.  Turning naggle off as suggested by Richard also seem to have 
a make a little difference.  Thanks 

  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Announce: zfsdump

2010-07-05 Thread Tristram Scott
 At this point, I will repeat my recommendation about
 using
 zpool-in-files as a backup (staging) target.
  Depending where you
 ost, and how you combine the files, you can achieve
 these scenarios
 without clunkery, and with all the benefits a zpool
 provides.
 

This is another good scheme.

I see a number of points to consider when choosing amongst the various 
suggestions for backing up zfs file systems.  In no particular order, I have 
these:

1. Does it work in place, or need an intermediate copy on disk?
2. Does it respect ACLs?
3. Does it respect zfs snapshots?
4. Does it allow random access to files, or only full file system restore?
5. Can it (mostly) survive partial data corruption?
6. Can it handle file systems larger than a single tape?
7. Can it stream to multiple tapes in parallel?
8. Does it understand the concept of incremental backups?

I still see this as a serious gap in the offering of zfs.  Clearly so do many 
other people, as there are a lot of methods offered to handle at least some of 
the above.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs hangs with B141 when filebench runs

2010-07-05 Thread zhihui Chen
I tried to run zfs list on my system, but looks that this command
will hangs. This command can not return even if I press contrl+c as
following:
r...@intel7:/export/bench/io/filebench/results# zfs list
^C^C^C^C

^C^C^C^C




..
When this happens, I am running filebench benchmark with oltp
workload. But zpool status shows that all pools are in good statu
like following:
r...@intel7:~# zpool status
  pool: rpool
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c8t0d0s0  ONLINE   0 0 0

errors: No known data errors

  pool: tpool
 state: ONLINE
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
tpool   ONLINE   0 0 0
  c10t1d0   ONLINE   0 0 0

errors: No known data errors


My system is running B141 and tpool is using the latest version 26.
Tried command truss -p `pgrep zfs`, but  it failes like following:

r...@intel7:~# truss -p `pgrep zfs`
truss: unanticipated system error: 5060

Looks that zfs is in deadlock state, but I dont know what is the
cause. I have tried to run filebench/oltp workload several times, each
time it will leads to this state. But if I run filebench with other
workload such as fileserver, webwerver, this issue does not happen.

Thanks
Zhihui
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Expected throughput

2010-07-05 Thread Roy Sigurd Karlsbakk






 Just a short question - wouldn't it be easier, and perhaps faster, to just 
 have the MySQL DB on an NFS share? iSCSI adds 
complexity, both on the target and the initiator. 


Yes, we did tried both and we didn't notice any difference in term of 
performances. I've read conflicting opinions on which is best and the majority 
seems to say that iSCSI is better for databases, but I don't have any strong 
preference myself... Have you tried monitoring the I/O with vmstat or 
sar/sysstat? That should show the I/O speed as seen from Linux, and should be 
more relevant than the real I/O speed to/from the drives. 

Vennlige hilsener / Best regards 

roy 
-- 
Roy Sigurd Karlsbakk 
(+47) 97542685 
r...@karlsbakk.net 
http://blogg.karlsbakk.net/ 
-- 
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] never ending resilver

2010-07-05 Thread Francois

Hi list,

Here's my case :

pool: mypool
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 147h19m, 100.00% done, 0h0m to go
config:

NAME   STATE READ WRITE CKSUM
filerbackup13  DEGRADED 0 0 0
  raidz2   DEGRADED 0 0 0
c0t8d0 ONLINE   0 0 0
replacing  DEGRADED 0 0 0
  c0t9d0   OFFLINE  0 0 0
  c0t23d0  ONLINE   0 0 0  454G resilvered
c0t10d0ONLINE   0 0 0
c0t11d0ONLINE   0 0 0
c0t12d0ONLINE   0 0 0
c0t13d0ONLINE   0 0 0
c0t14d0ONLINE   0 0 0
c0t15d0ONLINE   0 0 0
c0t16d0ONLINE   0 0 0
c0t17d0ONLINE   0 0 0
c0t18d0ONLINE   0 0 0
c0t19d0ONLINE   0 0 0
c0t20d0ONLINE   0 0 0
c0t21d0ONLINE   0 0 0
c0t22d0ONLINE   0 0 0


After having launched replace command, I had to offlined c0t9d0 because 
it was generating too many warnings and slow down i/os.


Now replace seems to be finished but zpool status still displays 
replacing and according to scrub status, resilver seems to continue ?


Any idea how to clarify this situation ?

Thanks.

--
Francois
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] never ending resilver

2010-07-05 Thread Roy Sigurd Karlsbakk
 After having launched replace command, I had to offlined c0t9d0
 because
 it was generating too many warnings and slow down i/os.
 
 Now replace seems to be finished but zpool status still displays
 replacing and according to scrub status, resilver seems to continue
 ?
 
 Any idea how to clarify this situation ?

I've seen this happen earlier, and then, the resilvering (or scrub) was 
finished after a while - an hour or so. Watching iostat -xd showed high i/o 
traffic (without much from the users).

- What sort of drives are you using?
- For how long has the pool been in '100% done', while still resilvering?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Announce: zfsdump

2010-07-05 Thread Joerg Schilling
Tristram Scott tristram.sc...@quantmodels.co.uk wrote:

 I see a number of points to consider when choosing amongst the various 
 suggestions for backing up zfs file systems.  In no particular order, I have 
 these:

Let me fill this out for star ;-)

 1. Does it work in place, or need an intermediate copy on disk?

Yes

 2. Does it respect ACLs?

not yet (because of missing interest from Sun)
If people show interest, a ZFS ACL implementation would not take much time
as there is already UFS ACL support in star.

 3. Does it respect zfs snapshots?

Yes
Star recommends to run incrementals on snapshots. Star incrementals
will work correclty if the snapshot just creates a new filesystem ID but 
leaves inode numbers identical (this is how it works with UFS snapshots).

 4. Does it allow random access to files, or only full file system restore?

Yes

 5. Can it (mostly) survive partial data corruption?

Yes for data curruption in the archive, for data currupion in ZFS - see ZFS

 6. Can it handle file systems larger than a single tape?

Yes

 7. Can it stream to multiple tapes in parallel?

There is Hardware for this task (check for TAPE RAID)

 8. Does it understand the concept of incremental backups?

Yes

And regarding the speed for incrementals:

A scan on a Sunfire X 4540 with a typical mix of small and large files (1.5 TB 
of filesystem data in 7.7 million files) takes 20 minutes. There seems to be a 
performance problem in the ZFS implementation: The data is made from 4 copies 
of identical file sets, each 370 GB in size and the performance degrades after 
some time. During parsing the first set of files, the performance is 4x higher, 
so this 1.5 TB test could have been finished in 5 minutes.
This test was done with an empty cache. With a populated cache, the incremental
scan is much faster and takes only 4 minutes.

It seems that incrementals at user space level still are feasible.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] never ending resilver

2010-07-05 Thread Orvar Korvar
If you have one zpool consisting of only one large raidz2, then you have a slow 
raid. To reach high speed, you need maximum 8 drives in each raidz2. So one of 
the reasons it takes time, is because you have too many drives in your raidz2. 
Everything would be much faster if you split your zpool into two raidz2, each 
consisting of 7 or 8 drives. Then it would be fast.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] never ending resilver

2010-07-05 Thread Roy Sigurd Karlsbakk
- Original Message -
 If you have one zpool consisting of only one large raidz2, then you
 have a slow raid. To reach high speed, you need maximum 8 drives in
 each raidz2. So one of the reasons it takes time, is because you have
 too many drives in your raidz2. Everything would be much faster if you
 split your zpool into two raidz2, each consisting of 7 or 8 drives.
 Then it would be fast.

Keeping the VDEVs small is one thing, but this is about resilvering spending 
far more time than reported. The same applies to scrubbing at times.

Would it be hard to rewrite the reporting mechanisms in ZFS to report something 
more likely, than just a first guess? ZFS scrub reports tremendous times at 
start, but slows down after it's worked it's way through the metadata. What ZFS 
is doing when the system still scrubs after 100 hours at 100% is beyond my 
knowledge.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] never ending resilver

2010-07-05 Thread Tomas Ögren
On 05 July, 2010 - Roy Sigurd Karlsbakk sent me these 1,9K bytes:

 - Original Message -
  If you have one zpool consisting of only one large raidz2, then you
  have a slow raid. To reach high speed, you need maximum 8 drives in
  each raidz2. So one of the reasons it takes time, is because you have
  too many drives in your raidz2. Everything would be much faster if you
  split your zpool into two raidz2, each consisting of 7 or 8 drives.
  Then it would be fast.
 
 Keeping the VDEVs small is one thing, but this is about resilvering spending 
 far more time than reported. The same applies to scrubbing at times.
 
 Would it be hard to rewrite the reporting mechanisms in ZFS to report
 something more likely, than just a first guess? ZFS scrub reports
 tremendous times at start, but slows down after it's worked it's way
 through the metadata. What ZFS is doing when the system still scrubs
 after 100 hours at 100% is beyond my knowledge.

I believe it's something like this:
* When starting, it notes the number of blocks to visit
* .. visiting blocks ...
* .. adding more data (which then will be beyond the original 100%) ..
  and visiting blocks ...
* .. reaching the initial last block, which since then has gotten lots
  of new friends afterwards.

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6899970

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] never ending resilver

2010-07-05 Thread Ian Collins

On 07/ 6/10 02:21 AM, Francois wrote:

Hi list,

Here's my case :

pool: mypool
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 147h19m, 100.00% done, 0h0m to go
config:


snip


After having launched replace command, I had to offlined c0t9d0 
because it was generating too many warnings and slow down i/os.


Now replace seems to be finished but zpool status still displays 
replacing and according to scrub status, resilver seems to continue ?



As others have noted, your wide raidz2 will be slow to resilver.

As for the reported progress, I see this all the time with an x4500.  
The resilver is often 100% done for over half of the real resilver time 
(which is normally 100 hours for a 500G drive in an 8 drive raidz).  
This box is a backup server, so there is a fair amount of churn, which I 
assume confuses the reporting.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NexentaStor 3.0.3 vs OpenSolaris - Patches more up to date?

2010-07-05 Thread Erast

In 3.0.3+ new option would list appliance changelog going forward:

nmc$ show version -c

On 07/04/2010 05:58 PM, Bohdan Tashchuk wrote:

Where can I find a list of these?


This leads to the more generic question of: where are *any* release notes?

I saw on Genunix that Community Edition 3.0.3 was replaced by 3.0.3-1. What changed? I  
went to nexenta.org and looked around. But it wasn't immediately obvious where to find 
release notes. Also, as Tim Cook noted, the Nexenta forums aren't exactly 
lively.

For a simple, easily understood and easily navigated web site, you can't beat 
www.openbsd.org. Both Sun/Oracle and Nexenta could learn a lot from it. And I can also 
follow very clean, simple instructions for running the stable OpenBSD branch 
(which is mostly security fixes).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Expected throughput

2010-07-05 Thread Richard Elling
On Jul 5, 2010, at 4:19 AM, Ian D wrote:
 Also, are you using jumbo frames? That can usually help a bit with either 
 access protocol
 
 
 Yes.  It was off early on and we did notice a significant difference once we 
 switched it on.  Turning naggle off as suggested by Richard also seem to 
 have a make a little difference.  Thanks

You need to disable Nagle on both ends: client and server.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss