Re: [zfs-discuss] Expected throughput
The database is MySQL, it runs on a Linux box that connects to the Nexenta server through 10GbE using iSCSI. Just a short question - wouldn't it be easier, and perhaps faster, to just have the MySQL DB on an NFS share? iSCSI adds complexity, both on the target and the initiator. Also, are you using jumbo frames? That can usually help a bit with either access protocol Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Expected throughput
Just a short question - wouldn't it be easier, and perhaps faster, to just have the MySQL DB on an NFS share? iSCSI adds complexity, both on the target and the initiator. Yes, we did tried both and we didn't notice any difference in term of performances. I've read conflicting opinions on which is best and the majority seems to say that iSCSI is better for databases, but I don't have any strong preference myself... Also, are you using jumbo frames? That can usually help a bit with either access protocol Yes. It was off early on and we did notice a significant difference once we switched it on. Turning naggle off as suggested by Richard also seem to have a make a little difference. Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Announce: zfsdump
At this point, I will repeat my recommendation about using zpool-in-files as a backup (staging) target. Depending where you ost, and how you combine the files, you can achieve these scenarios without clunkery, and with all the benefits a zpool provides. This is another good scheme. I see a number of points to consider when choosing amongst the various suggestions for backing up zfs file systems. In no particular order, I have these: 1. Does it work in place, or need an intermediate copy on disk? 2. Does it respect ACLs? 3. Does it respect zfs snapshots? 4. Does it allow random access to files, or only full file system restore? 5. Can it (mostly) survive partial data corruption? 6. Can it handle file systems larger than a single tape? 7. Can it stream to multiple tapes in parallel? 8. Does it understand the concept of incremental backups? I still see this as a serious gap in the offering of zfs. Clearly so do many other people, as there are a lot of methods offered to handle at least some of the above. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs hangs with B141 when filebench runs
I tried to run zfs list on my system, but looks that this command will hangs. This command can not return even if I press contrl+c as following: r...@intel7:/export/bench/io/filebench/results# zfs list ^C^C^C^C ^C^C^C^C .. When this happens, I am running filebench benchmark with oltp workload. But zpool status shows that all pools are in good statu like following: r...@intel7:~# zpool status pool: rpool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scan: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c8t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: tpool state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM tpool ONLINE 0 0 0 c10t1d0 ONLINE 0 0 0 errors: No known data errors My system is running B141 and tpool is using the latest version 26. Tried command truss -p `pgrep zfs`, but it failes like following: r...@intel7:~# truss -p `pgrep zfs` truss: unanticipated system error: 5060 Looks that zfs is in deadlock state, but I dont know what is the cause. I have tried to run filebench/oltp workload several times, each time it will leads to this state. But if I run filebench with other workload such as fileserver, webwerver, this issue does not happen. Thanks Zhihui ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Expected throughput
Just a short question - wouldn't it be easier, and perhaps faster, to just have the MySQL DB on an NFS share? iSCSI adds complexity, both on the target and the initiator. Yes, we did tried both and we didn't notice any difference in term of performances. I've read conflicting opinions on which is best and the majority seems to say that iSCSI is better for databases, but I don't have any strong preference myself... Have you tried monitoring the I/O with vmstat or sar/sysstat? That should show the I/O speed as seen from Linux, and should be more relevant than the real I/O speed to/from the drives. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] never ending resilver
Hi list, Here's my case : pool: mypool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 147h19m, 100.00% done, 0h0m to go config: NAME STATE READ WRITE CKSUM filerbackup13 DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c0t8d0 ONLINE 0 0 0 replacing DEGRADED 0 0 0 c0t9d0 OFFLINE 0 0 0 c0t23d0 ONLINE 0 0 0 454G resilvered c0t10d0ONLINE 0 0 0 c0t11d0ONLINE 0 0 0 c0t12d0ONLINE 0 0 0 c0t13d0ONLINE 0 0 0 c0t14d0ONLINE 0 0 0 c0t15d0ONLINE 0 0 0 c0t16d0ONLINE 0 0 0 c0t17d0ONLINE 0 0 0 c0t18d0ONLINE 0 0 0 c0t19d0ONLINE 0 0 0 c0t20d0ONLINE 0 0 0 c0t21d0ONLINE 0 0 0 c0t22d0ONLINE 0 0 0 After having launched replace command, I had to offlined c0t9d0 because it was generating too many warnings and slow down i/os. Now replace seems to be finished but zpool status still displays replacing and according to scrub status, resilver seems to continue ? Any idea how to clarify this situation ? Thanks. -- Francois ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] never ending resilver
After having launched replace command, I had to offlined c0t9d0 because it was generating too many warnings and slow down i/os. Now replace seems to be finished but zpool status still displays replacing and according to scrub status, resilver seems to continue ? Any idea how to clarify this situation ? I've seen this happen earlier, and then, the resilvering (or scrub) was finished after a while - an hour or so. Watching iostat -xd showed high i/o traffic (without much from the users). - What sort of drives are you using? - For how long has the pool been in '100% done', while still resilvering? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Announce: zfsdump
Tristram Scott tristram.sc...@quantmodels.co.uk wrote: I see a number of points to consider when choosing amongst the various suggestions for backing up zfs file systems. In no particular order, I have these: Let me fill this out for star ;-) 1. Does it work in place, or need an intermediate copy on disk? Yes 2. Does it respect ACLs? not yet (because of missing interest from Sun) If people show interest, a ZFS ACL implementation would not take much time as there is already UFS ACL support in star. 3. Does it respect zfs snapshots? Yes Star recommends to run incrementals on snapshots. Star incrementals will work correclty if the snapshot just creates a new filesystem ID but leaves inode numbers identical (this is how it works with UFS snapshots). 4. Does it allow random access to files, or only full file system restore? Yes 5. Can it (mostly) survive partial data corruption? Yes for data curruption in the archive, for data currupion in ZFS - see ZFS 6. Can it handle file systems larger than a single tape? Yes 7. Can it stream to multiple tapes in parallel? There is Hardware for this task (check for TAPE RAID) 8. Does it understand the concept of incremental backups? Yes And regarding the speed for incrementals: A scan on a Sunfire X 4540 with a typical mix of small and large files (1.5 TB of filesystem data in 7.7 million files) takes 20 minutes. There seems to be a performance problem in the ZFS implementation: The data is made from 4 copies of identical file sets, each 370 GB in size and the performance degrades after some time. During parsing the first set of files, the performance is 4x higher, so this 1.5 TB test could have been finished in 5 minutes. This test was done with an empty cache. With a populated cache, the incremental scan is much faster and takes only 4 minutes. It seems that incrementals at user space level still are feasible. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] never ending resilver
If you have one zpool consisting of only one large raidz2, then you have a slow raid. To reach high speed, you need maximum 8 drives in each raidz2. So one of the reasons it takes time, is because you have too many drives in your raidz2. Everything would be much faster if you split your zpool into two raidz2, each consisting of 7 or 8 drives. Then it would be fast. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] never ending resilver
- Original Message - If you have one zpool consisting of only one large raidz2, then you have a slow raid. To reach high speed, you need maximum 8 drives in each raidz2. So one of the reasons it takes time, is because you have too many drives in your raidz2. Everything would be much faster if you split your zpool into two raidz2, each consisting of 7 or 8 drives. Then it would be fast. Keeping the VDEVs small is one thing, but this is about resilvering spending far more time than reported. The same applies to scrubbing at times. Would it be hard to rewrite the reporting mechanisms in ZFS to report something more likely, than just a first guess? ZFS scrub reports tremendous times at start, but slows down after it's worked it's way through the metadata. What ZFS is doing when the system still scrubs after 100 hours at 100% is beyond my knowledge. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] never ending resilver
On 05 July, 2010 - Roy Sigurd Karlsbakk sent me these 1,9K bytes: - Original Message - If you have one zpool consisting of only one large raidz2, then you have a slow raid. To reach high speed, you need maximum 8 drives in each raidz2. So one of the reasons it takes time, is because you have too many drives in your raidz2. Everything would be much faster if you split your zpool into two raidz2, each consisting of 7 or 8 drives. Then it would be fast. Keeping the VDEVs small is one thing, but this is about resilvering spending far more time than reported. The same applies to scrubbing at times. Would it be hard to rewrite the reporting mechanisms in ZFS to report something more likely, than just a first guess? ZFS scrub reports tremendous times at start, but slows down after it's worked it's way through the metadata. What ZFS is doing when the system still scrubs after 100 hours at 100% is beyond my knowledge. I believe it's something like this: * When starting, it notes the number of blocks to visit * .. visiting blocks ... * .. adding more data (which then will be beyond the original 100%) .. and visiting blocks ... * .. reaching the initial last block, which since then has gotten lots of new friends afterwards. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6899970 /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] never ending resilver
On 07/ 6/10 02:21 AM, Francois wrote: Hi list, Here's my case : pool: mypool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 147h19m, 100.00% done, 0h0m to go config: snip After having launched replace command, I had to offlined c0t9d0 because it was generating too many warnings and slow down i/os. Now replace seems to be finished but zpool status still displays replacing and according to scrub status, resilver seems to continue ? As others have noted, your wide raidz2 will be slow to resilver. As for the reported progress, I see this all the time with an x4500. The resilver is often 100% done for over half of the real resilver time (which is normally 100 hours for a 500G drive in an 8 drive raidz). This box is a backup server, so there is a fair amount of churn, which I assume confuses the reporting. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NexentaStor 3.0.3 vs OpenSolaris - Patches more up to date?
In 3.0.3+ new option would list appliance changelog going forward: nmc$ show version -c On 07/04/2010 05:58 PM, Bohdan Tashchuk wrote: Where can I find a list of these? This leads to the more generic question of: where are *any* release notes? I saw on Genunix that Community Edition 3.0.3 was replaced by 3.0.3-1. What changed? I went to nexenta.org and looked around. But it wasn't immediately obvious where to find release notes. Also, as Tim Cook noted, the Nexenta forums aren't exactly lively. For a simple, easily understood and easily navigated web site, you can't beat www.openbsd.org. Both Sun/Oracle and Nexenta could learn a lot from it. And I can also follow very clean, simple instructions for running the stable OpenBSD branch (which is mostly security fixes). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Expected throughput
On Jul 5, 2010, at 4:19 AM, Ian D wrote: Also, are you using jumbo frames? That can usually help a bit with either access protocol Yes. It was off early on and we did notice a significant difference once we switched it on. Turning naggle off as suggested by Richard also seem to have a make a little difference. Thanks You need to disable Nagle on both ends: client and server. -- richard -- Richard Elling rich...@nexenta.com +1-760-896-4422 ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss