Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
On Thu, Sep 07, 2006 at 12:14:20PM -0700, Richard Elling - PAE wrote: [EMAIL PROTECTED] wrote: This is the case where I don't understand Sun's politics at all: Sun doesn't offer really cheap JBOD which can be bought just for ZFS. And don't even tell me about 3310/3320 JBODs - they are horrible expansive :-( Yep, multipacks are EOL for some time now -- killed by big disks. Back when disks were small, people would buy multipacks to attach to their workstations. There was a time when none of the workstations had internal disks, but I'd be dating myself :-) For datacenter-class storage, multipacks were not appropriate. They only had single-ended SCSI interfaces which have a limited cable budget which limited their use in racks. Also, they weren't designed to be used in a rack environment, so they weren't mechanically appropriate either. I suppose you can still find them on eBay. If Sun wants ZFS to be absorbed quicker it should have such _really_ cheap JBOD. I don't quite see this in my crystal ball. Rather, I see all of the SAS/SATA chipset vendors putting RAID in the chipset. Basically, you can't get a dumb interface anymore, except for fibre channel :-). In other words, if we were to design a system in a chassis with perhaps 8 disks, then we would also use a controller which does RAID. So, we're right back to square 1. Richard, when I talk about cheap JBOD I think about home users/small servers/small companies. I guess you can sell 100 X4500 and at the same time 1000 (or even more) cheap JBODs to the small companies which for sure will not buy the big boxes. Yes, I know, you earn more selling X4500. But what do you think, how Linux found its way to data centers and become important player in OS space ? Through home users/enthusiasts who become familiar with it and then started using the familiar things in their job. Proven way to achieve world domination. ;-)) przemol ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320
Roch - PAE wrote: The hard part is getting a set of simple requirements. As you go into more complex data center environments you get hit with older Solaris revs, other OSs, SOX compliance issues, etc. etc. etc. The world where most of us seem to be playing with ZFS is on the lower end of the complexity scale. I've been watching this thread and unfortunately fit this model. I'd hoped that ZFS might scale enough to solve my problem but you seem to be saying that it's mostly untested in large scale environments. About 7 years ago we ran out of inodes on our UFS file systems. We used bFile as middleware for a while to distribute the files across multiple disks and then switched to VFS on SAN about 5 years ago. Distribution across file systems and inode depletion continued to be a problem so we switched middleware to another vendor that essentially compresses about 200 files into a single 10Mb archive and uses a DB to find the file within the archive on the correct disk. Expensive, complex and slow but effective solution until the latest license renewal when we got hit with a huge bill. I'd love to go back to a pure file system model and looked at Reiser4, JFS, NTFS and now ZFS for a way to support over 100 million small documents and 16Tb. We average 2 file reads and 1 file write per second 24/7 with expected growth to 24Tb. I'd be willing to scrap everything we have to find a non-proprietary long term solution. ZFS looked like it might provide an answer. Are you saying it's not really suitable for this type of application? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
On Fri, Sep 08, 2006 at 09:41:58AM +0100, Darren J Moffat wrote: [EMAIL PROTECTED] wrote: Richard, when I talk about cheap JBOD I think about home users/small servers/small companies. I guess you can sell 100 X4500 and at the same time 1000 (or even more) cheap JBODs to the small companies which for sure will not buy the big boxes. Yes, I know, you earn more selling X4500. But what do you think, how Linux found its way to data centers and become important player in OS space ? Through home users/enthusiasts who become familiar with it and then started using the familiar things in their job. But Linux isn't a hardware vendor and doesn't make cheap JBOD or multipack for the home user. Linux is used as a symbol. So I don't see how we get from Sun should make cheap home user JBOD (which BTW we don't really have the channel to sell for anyway) to but Linux dominated this way. Home user = tech/geek/enthusiasts who is an admin in job [ Linux ] Home user is using linux at home and is satisfied with it. He/she then goes to job and says Let's install/use it on less important servers. He/she (and management) is again satisfied with it. So lets use it at more important servers ... etc. [ ZFS ] Home user is using ZFS (Solaris) at home (remember easiness and even WEB interface to ZFS operations !,) to keep photos, musics, etc. and is satisfied with it. He/she the goes to his/her job and says I use for a while a fantastic filesystem. Lets use it on less important servers. Ok. Later on Works ok. Let's use on more important . Etc... Yes, I know, a bit naive. But remember that not only Linux spreads this way but also Solaris as well. I guess most of downloaded Solaris CD/DVD are for x86. You as a company attack at high end/midrange level. Let users/admins/fans attack at lower end level. przemol ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Hello James, Thursday, September 7, 2006, 8:58:10 PM, you wrote: JD with ZFS I have found that memory is a much greater limitation, even JD my dual 300mhz u2 has no problem filling 2x 20MB/s scsi channels, even JD with compression enabled, using raidz and 10k rpm 9GB drives, thanks JD to its 2GB of ram it does great at everything I throw at it. On the JD other hand my blade 1500 ram 512MB with 3x 18GB 10k rpm drives using JD 2x 40MB/s scsi channels , os is on a 80GB ide drive, has problems JD interactively because as soon as you push zfs hard it hogs all the ram JD and may take 5 or 10 seconds to get response on xterms while the JD machine clears out ram and loads its applications/data back into ram. IIRC correctly there's is a bug in SPARC ata driver which when combined with ZFS expresses itself. Unless you use only ZFS on those SCSI drives...? -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re[2]: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
zfs hogs all the ram under a sustained heavy write load. This is being tracked by: 6429205 each zpool needs to monitor it's throughput and throttle heavy writers -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs assertion failure
Hi, My desktop paniced last night during a zfs receive operation. This is a dual opteron system running snv_47 and bfu'd to DEBUG project bits that are in sync with the onnv gate as of two days ago. The project bits are for Opteron FMA and don't appear at all active in the panic. I'll log a bug unless someone reconises this as a known issue: ::status debugging crash dump vmcore.0 (64-bit) from enogas operating system: 5.11 onnv-dev (i86pc) panic message: assertion failed: ((dnp-dn_blkptr[0])-blk_birth == 0) || list_head(dn-dn_dirty_dbufs[txgoff]) != 0L || dn-dn_next_blksz[txgoff] 9 == dnp-dn_datablkszsec, file: ../../common/fs/zfs/dnode_syn dump content: kernel pages only $c vpanic() assfail+0x7e(f06daa80, f06daa58, 220) dnode_sync+0x5ef(8e0ce3f8, 0, 8e0c81c0, 8adde1c0) dmu_objset_sync_dnodes+0xa4(8be25340, 8be25480, 8adde1c0) dmu_objset_sync+0xfd(8be25340, 8adde1c0) dsl_dataset_sync+0x4a(8e2286c0, 8adde1c0) dsl_pool_sync+0xa7(89ef3900, 248bbb) spa_sync+0x1d5(82ea2700, 248bbb) txg_sync_thread+0x221(89ef3900) thread_start+8() dnode_sync(dnode_t *dn, int level, zio_t *zio, dmu_tx_t *tx) { free_range_t *rp; int txgoff = tx-tx_txg TXG_MASK; dnode_phys_t *dnp = dn-dn_phys; ... if (dn-dn_next_blksz[txgoff]) { ASSERT(P2PHASE(dn-dn_next_blksz[txgoff], SPA_MINBLOCKSIZE) == 0); ASSERT(BP_IS_HOLE(dnp-dn_blkptr[0]) || list_head(dn-dn_dirty_dbufs[txgoff]) != NULL || dn-dn_next_blksz[txgoff] SPA_MINBLOCKSHIFT == dnp-dn_datablkszsec); ... } ... } We get txgoff = 0x248bbb 0x3 = 0x3 dnp = 0xfe80e648b400 0xfe80e648b400::print dnode_phys_t { dn_type = 0x16 dn_indblkshift = 0xe dn_nlevels = 0x1 dn_nblkptr = 0x3 dn_bonustype = 0 dn_checksum = 0 dn_compress = 0 dn_flags = 0x1 dn_datablkszsec = 0x1c dn_bonuslen = 0 dn_pad2 = [ 0, 0, 0, 0 ] dn_maxblkid = 0 dn_used = 0x800 dn_pad3 = [ 0, 0, 0, 0 ] dn_blkptr = [ { blk_dva = [ { dva_word = [ 0x2, 0x3015472 ] } { dva_word = [ 0x2, 0x4613b32 ] } { dva_word = [ 0, 0 ] } ] blk_prop = 0x801607030001001b blk_pad = [ 0, 0, 0 ] blk_birth = 0x221478 blk_fill = 0x1 blk_cksum = { zc_word = [ 0x4b4b88c4e6, 0x39c18ca2a5a1, 0x16ea3555d00431, 0x640a1f2b2c8b322 ] } } ] dn_bonus = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ] } So regarding the assertion we have dnp-dn_blkptr[0])-blk_birth == 0x221478 8e0ce3f8::print -at dnode_t dn_dirty_dbufs[3] { 8e0ce510 size_t dn_dirty_dbufs[3].list_size = 0x198 8e0ce518 size_t dn_dirty_dbufs[3].list_offset = 0x120 8e0ce520 struct list_node dn_dirty_dbufs[3].list_head = { 8e0ce520 struct list_node *list_next = 0x8e0ce520 8e0ce528 struct list_node *list_prev = 0x8e0ce520 } } So we have list_empty() for that list (list_next above points to list_head) and list_head() will have returned NULL. So we're relying on the 3rd component of the assertion to pass: 8e0ce3f8::print dnode_t dn_next_blksz dn_next_blksz = [ 0, 0, 0, 0x4a00 ] We're using the 0x4a00 from that. 0x4a00 9 = 0x25; from the dnode_phys_t above we have dnp-dn_datablkszsec of 0x1c. Boom. Sun folks can login to enogas.uk and /var/crash/enogas/*,0 is accessible. Gavin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320
On Fri, 8 Sep 2006, Jim Sloey wrote: Roch - PAE wrote: The hard part is getting a set of simple requirements. As you go into more complex data center environments you get hit with older Solaris revs, other OSs, SOX compliance issues, etc. etc. etc. The world where most of us seem to be playing with ZFS is on the lower end of the complexity scale. ... reformatted .. I've been watching this thread and unfortunately fit this model. I'd hoped that ZFS might scale enough to solve my problem but you seem to be saying that it's mostly untested in large scale environments. About 7 years ago we ran out of inodes on our UFS file systems. We used bFile as middleware for a while to distribute the files across multiple disks and then switched to VFS on SAN about 5 years ago. Distribution across file systems and inode depletion continued to be a problem so we switched middleware to another vendor that essentially compresses about 200 files into a single 10Mb archive and uses a DB to find the file within the archive on the correct disk. Expensive, complex and slow but effective solution until the latest license renewal when we got hit with a huge bill. I'd love to go back to a pure file system model and looked at Reiser4, JFS, NTFS and now ZFS for a way to support over 100 million small documents and 16Tb. We average 2 file reads and 1 file write per second 24/7 with expected growth to 24Tb. I'd be willing to scrap everything we have to find a non-proprietary long term solution. ZFS looked like it might provide an answer. Are you saying it's not really suitable for this type of application? No - that's not what he is saying. Personally I think (from the info presented) is that ZFS would be a viable long term solution to this storage headache. But the neat thing about ZFS, is that, with a spare AMD based box and, as few as 5 low-cost SATA drives, you can actually try it[1]. Think about this for a Second. You can put together a test ZFS box for less money than you would spend, in man-hours, talking about it as a _possible_ solution. [1] 5 to 10 SATA drives won't get you 16Tb - but it'll get you close enough to model the system with a substantial portion of your dataset. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ?: ZFS and jumpstart export race condition
I have a jumpstart server where the install images are on a ZFS pool. For PXE boot, several lofs mounts are created and configured in /etc/vfstab. My system does not boot properly anymore because the mounts referring to jumstart files haven't been mounted yet via ZFS. What is the best way of working around this? Can I just create the necessary mounts of pool1/jumpstart in /etc/vfstab, or is ZFS just not running yet when these mounts get attempted? A lot of network services, including ssh, are not running because fs-local did not come up clean. Is this a know problem that is being addressed? This is S10 6/06. Thanks Steffen # cat /etc/vfstab ... /export/jumpstart/s10/x86/boot - /tftpboot/I86PC.Solaris_10-1 lofs - yes ro /export/jumpstart/nv/x86/latest/boot - /tftpboot/I86PC.Solaris_11-1 lofs - yes ro /export/jumpstart/s10u3/x86/latest/boot - /tftpboot/I86PC.Solaris_10-2 lofs - yes ro # zfs get all pool1/jumpstart NAME PROPERTY VALUE SOURCE pool1/jumpstart type filesystem - pool1/jumpstart creation Mon Jun 12 8:26 2006 - pool1/jumpstart used 39.9G - pool1/jumpstart available 17.7G - pool1/jumpstart referenced 39.9G - pool1/jumpstart compressratio 1.00x - pool1/jumpstart mountedyes- pool1/jumpstart quota none default pool1/jumpstart reservationnone default pool1/jumpstart recordsize 128K default pool1/jumpstart mountpoint /export/jumpstart local pool1/jumpstart sharenfs ro,anon=0 local pool1/jumpstart checksum on default pool1/jumpstart compressionoffdefault pool1/jumpstart atime on default pool1/jumpstart deviceson default pool1/jumpstart exec on default pool1/jumpstart setuid on default pool1/jumpstart readonly offdefault pool1/jumpstart zoned offdefault pool1/jumpstart snapdirhidden default pool1/jumpstart aclmodegroupmask default pool1/jumpstart aclinherit secure default ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS on production servers with SLA
Hi, I'm currently doing some tests on a SF15K domain with Solaris 10 installed. The target is to convince my cu to use Solaris 10 for this domain AND establish a list of recommendations. The ZFS perimeter is really an issue for me. For now, I'm waiting for fresh informations from the backup software vendor about ZFS support. No ZFS-acl support could be annoying. Regarding system partitions (/var, /opt, all mirrored + alternate disk), what would be YOUR recommendations ? ZFS or not ? TIA Nicolas This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ?: ZFS and jumpstart export race condition
Steffen, I have the same with my home-installserver. As a dirty solution I set mount-at-boot to no for the lofs Filesystems, to get the system up. But with every new OS added by JET the mount at reboot reappears. Seems to me as the question when should a lofs filesystem be mounted at boot. When does a zfs filesystem get mounted? Probably a zfs legacy mount together with a lower priority lofs mount would do it. Regards, Thomas On Fri, Sep 08, 2006 at 08:18:06AM -0400, Steffen Weiberle wrote: I have a jumpstart server where the install images are on a ZFS pool. For PXE boot, several lofs mounts are created and configured in /etc/vfstab. My system does not boot properly anymore because the mounts referring to jumstart files haven't been mounted yet via ZFS. What is the best way of working around this? Can I just create the necessary mounts of pool1/jumpstart in /etc/vfstab, or is ZFS just not running yet when these mounts get attempted? A lot of network services, including ssh, are not running because fs-local did not come up clean. Is this a know problem that is being addressed? This is S10 6/06. Thanks Steffen # cat /etc/vfstab ... /export/jumpstart/s10/x86/boot - /tftpboot/I86PC.Solaris_10-1 lofs - yes ro /export/jumpstart/nv/x86/latest/boot - /tftpboot/I86PC.Solaris_11-1 lofs - yes ro /export/jumpstart/s10u3/x86/latest/boot - /tftpboot/I86PC.Solaris_10-2 lofs - yes ro # zfs get all pool1/jumpstart NAME PROPERTY VALUE SOURCE pool1/jumpstart type filesystem - pool1/jumpstart creation Mon Jun 12 8:26 2006 - pool1/jumpstart used 39.9G - pool1/jumpstart available 17.7G - pool1/jumpstart referenced 39.9G - pool1/jumpstart compressratio 1.00x - pool1/jumpstart mountedyes- pool1/jumpstart quota none default pool1/jumpstart reservationnone default pool1/jumpstart recordsize 128K default pool1/jumpstart mountpoint /export/jumpstart local pool1/jumpstart sharenfs ro,anon=0 local pool1/jumpstart checksum on default pool1/jumpstart compressionoffdefault pool1/jumpstart atime on default pool1/jumpstart deviceson default pool1/jumpstart exec on default pool1/jumpstart setuid on default pool1/jumpstart readonly offdefault pool1/jumpstart zoned offdefault pool1/jumpstart snapdirhidden default pool1/jumpstart aclmodegroupmask default pool1/jumpstart aclinherit secure default ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Mit freundlichen Gruessen, Thomas Wagner -- * Thomas WagnerTel:+49-(0)-711-720 98-131 Strategic Support Engineer Fax:+49-(0)-711-720 98-443 Global Customer Services Cell: +49-(0)-175-292 60 64 Sun Microsystems GmbHE-Mail: [EMAIL PROTECTED] Zettachring 10A, D-70567 Stuttgart http://www.sun.de ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on production servers with SLA
Nicolas Dorfsman wrote: Hi, I'm currently doing some tests on a SF15K domain with Solaris 10 installed. The target is to convince my cu to use Solaris 10 for this domain AND establish a list of recommendations. The ZFS perimeter is really an issue for me. For now, I'm waiting for fresh informations from the backup software vendor about ZFS support. No ZFS-acl support could be annoying. Regarding system partitions (/var, /opt, all mirrored + alternate disk), what would be YOUR recommendations ? ZFS or not ? /var for now must be UFS since Solaris 10 doesn't not have ZFS root support and that means /, /etc/, /var/, /usr. I've run systems with /opt as a ZFS filesystem and it works just fine. However note that the Solaris installed puts stuff in /opt (for backwards compat reasons, ideally it wouldn't) and that may cause issues with live upgrade or require you to move that stuff onto your ZFS /opt datasets. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ?: ZFS and jumpstart export race condition
I have the same with my home-installserver. As a dirty solution I set mount-at-boot to no for the lofs Filesystems, to get the system up. But with every new OS added by JET the mount at reboot reappears. Seems to me as the question when should a lofs filesystem be mounted at boot. When does a zfs filesystem get mounted? Probably a zfs legacy mount together with a lower priority lofs mount would do it. JET needs to be taught about ZFS; there does not seem to be any other way. (JET/setup_install_server creates the loopback mounts; without making the ZFS mounts into legacy mounts or creating them differently it will not work; personally I use auto_direct for the /tftpboot sub mounts; works for anything) Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ?: ZFS and jumpstart export race condition
[EMAIL PROTECTED] wrote On 09/08/06 09:06,: I have the same with my home-installserver. As a dirty solution I set mount-at-boot to no for the lofs Filesystems, to get the system up. But with every new OS added by JET the mount at reboot reappears. Seems to me as the question when should a lofs filesystem be mounted at boot. When does a zfs filesystem get mounted? Probably a zfs legacy mount together with a lower priority lofs mount would do it. JET needs to be taught about ZFS; there does not seem to be any other way. Maybe. However, I did not use JET. I set up ZFS using default (AFAIK at this point) parameters. (JET/setup_install_server creates the loopback mounts; without making the ZFS mounts into legacy mounts or creating them differently it will not work; personally I use auto_direct for the /tftpboot sub mounts; works for anything) I believe that add_install_client [with a -b option?] is what is creating my vfstab entries. I haven't had reboot issues until overnight (a system move), and I have been doing PXE boot of some x64 systems only recently, i.e. since the most recent power failure. Install images are being put down via getimage, so it is possible that setup_install_server would do the same. (not sure whether getimage does a setup_install_server at it completion.) Steffen Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ?: ZFS and jumpstart export race condition
I believe that add_install_client [with a -b option?] is what is creating my vfstab entries. I haven't had reboot issues until overnight (a system move), and I have been doing PXE boot of some x64 systems only recently, i.e. since the most recent power failure. Install images are being put down via getimage, so it is possible that setup_install_server would do the same. (not sure whether getimage does a setup_install_server at it completion.) Either setup_install_server or add_install_client does this (or perhaps both). Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs assertion failure
On 09/08/06 15:20, Mark Maybee wrote: Gavin, Please file a bug on this. I filed 6468748. Attach the core now. Cheers Gavin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
[EMAIL PROTECTED] wrote: I don't quite see this in my crystal ball. Rather, I see all of the SAS/SATA chipset vendors putting RAID in the chipset. Basically, you can't get a dumb interface anymore, except for fibre channel :-). In other words, if we were to design a system in a chassis with perhaps 8 disks, then we would also use a controller which does RAID. So, we're right back to square 1. Richard, when I talk about cheap JBOD I think about home users/small servers/small companies. I guess you can sell 100 X4500 and at the same time 1000 (or even more) cheap JBODs to the small companies which for sure will not buy the big boxes. Yes, I know, you earn more selling X4500. But what do you think, how Linux found its way to data centers and become important player in OS space ? Through home users/enthusiasts who become familiar with it and then started using the familiar things in their job. I was looking for a new AM2 socket motherboard a few weeks ago. All of the ones I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID. All were less than $150. In other words, the days of having a JBOD-only solution are over except for single disk systems. 4x750 GBytes is a *lot* of data (and video). There has been some recent discussion about eSATA JBODs in the press. I'm not sure they will gain much market share. iPods and flash drives have a much larger market share. Proven way to achieve world domination. ;-)) Dang! I was planning to steal a cobalt bomb and hold the world hostage while I relax in my space station... zero-G whee! :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problem with ZFS's performance
Josip Gracin wrote: Hello! Could somebody please explain the following bad performance of a machine running ZFS. I have a feeling it has something to with the way ZFS uses memory, because I've checked with ::kmastat and it shows that ZFS uses huge amounts of memory which I think is killing the performance of the machine. This is the test program: #include malloc.h #include stdio.h int main() { char *buf = calloc(51200,1); if ( buf == NULL ) { printf(Allocation failed.\n); } return 0; } I've run the test program on the following two different machines, both under light load: Machine A is AMD64 3000+ (2.0GHz), 1GB RAM running snv_46. - Machine B is Pentium4,2.4GHz, 512MB RAM running Linux. Execution times on several consecutive runs are: Machine A time ./a.out ./a.out 0.49s user 1.39s system 2% cpu 1:03.25 total ./a.out 0.48s user 1.28s system 3% cpu 50.691 total ./a.out 0.48s user 1.27s system 4% cpu 38.225 total ./a.out 0.48s user 1.24s system 5% cpu 30.694 total ./a.out 0.47s user 1.20s system 5% cpu 28.640 total ./a.out 0.47s user 1.23s system 6% cpu 28.210 total ./a.out 0.47s user 1.21s system 6% cpu 27.700 total ./a.out 0.47s user 1.19s system 9% cpu 17.875 total ./a.out 0.46s user 1.15s system 12% cpu 12.784 total On machine B [the first run took approx. 10 seconds, I forgot to paste it] ./a.out 0.14s user 0.89s system 27% cpu 3.711 total ./a.out 0.13s user 0.87s system 25% cpu 3.926 total ./a.out 0.11s user 0.90s system 29% cpu 3.456 total ./a.out 0.11s user 0.91s system 29% cpu 3.435 total ./a.out 0.10s user 0.91s system 38% cpu 2.597 total ./a.out 0.11s user 0.93s system 35% cpu 2.913 total ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss There are several things going on here, and part of that may well be the memory utilization of ZFS. Have you tried the same experiment when not using ZFS? Keep in mind that Solaris doesn't always use the most efficient strategies for paging applications... this is something we're actively working on fixing as part of the VM work going on... -Bart -- Bart Smaalders Solaris Kernel Performance [EMAIL PROTECTED] http://blogs.sun.com/barts ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
On Sep 8, 2006, at 9:33, Richard Elling - PAE wrote: I was looking for a new AM2 socket motherboard a few weeks ago. All of the ones I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID. All were less than $150. In other words, the days of having a JBOD-only solution are over except for single disk systems. 4x750 GBytes is a *lot* of data (and video). It's not clear to me that JBOD is dead. The (S)ATA RAID cards I've seen are really software RAID solutions that know just enough in the controller to let the BIOS boot off a RAID volume. None of the expensive RAID stuff is in the controller. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
Ed Gould wrote: On Sep 8, 2006, at 9:33, Richard Elling - PAE wrote: I was looking for a new AM2 socket motherboard a few weeks ago. All of the ones I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID. All were less than $150. In other words, the days of having a JBOD-only solution are over except for single disk systems. 4x750 GBytes is a *lot* of data (and video). It's not clear to me that JBOD is dead. The (S)ATA RAID cards I've seen are really software RAID solutions that know just enough in the controller to let the BIOS boot off a RAID volume. None of the expensive RAID stuff is in the controller. If I read between the lines here I think you're saying that the raid functionality is in the chipset but the management can only be done by software running on the outside. (Right?) A1000 anyone? :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
Dunno about eSATA jbods, but eSATA host ports have appeared on at least two HDTV-capable DVRs for storage expansion (looks like one model of the Scientific Atlanta cable box DVR's as well as on the shipping-any-day-now Tivo Series 3). It's strange that they didn't go with firewire since it's already widely used for digital video. Cost? If you use eSata it's pretty much just a physical connector onto the board, whereas I guess firewire needs a 1394 interface (couple of dollars?) plus a royalty to all the patent holders. It's probably not much, but I can't see how there can be *any* margin in consumer electronics these days... Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
Ed Gould wrote: On Sep 8, 2006, at 11:35, Torrey McMahon wrote: If I read between the lines here I think you're saying that the raid functionality is in the chipset but the management can only be done by software running on the outside. (Right?) No. All that's in the chipset is enough to read a RAID volume for boot. Block layout, RAID-5 parity calculations, and the rest are all done in the software. I wouldn't be surprised if RAID-5 parity checking was absent on read for boot, but I don't actually know. At Sun, we often use the LSI Logic LSISAS1064 series of SAS RAID controllers on motherboards for many products. [LSI claims support for Solaris 2.6!] These controllers have a builtin microcontroller(ARM 926, IIRC), firmware, and nonvolatile memory (NVSRAM) for implementing the RAID features. We manage them through BIOS, OBP, or raidctl(1m). As Torrey says, very much like the A1000. Some of the fancier LSI products offer RAID 5, too. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Hotswap not working
My first real-hardware Solaris install. I've installed S10 u2 on a system with an Asus M2n-SLI Deluxe nForce 570-SLI motherboard, Athlon 64 X2 dual core CPU. It's in a Chenbro SR107 case with two Chenbro 4-drive SATA hot-swap bays. C1D0 is in the first hot-swap bay, and is the boot drive (an 80GB). C2D0 is in the second bay, and is not used (eventually this will be a mirror of the boot drive, via the controller hardware, or Solaris software; possibly I'll even go to a ZFS boot system, when that's available). Another 80GB drive. C3D0 and C4D0 are 400GB drives in the third and fourth hot-swap bays. They're in a ZFS mirror vdev, and are the only thing in ZFS pool zp1. Everything works fine; I can create ZFS filesystems on the ZFS pool, put files on them, read them back, etc. I can run a scrub and it all checks out. ZFS status reports it healthy, online, all there, etc. So, having gotten this far, and it being a scratch install and all, I reached over and pulled out C3D0. I then typed a zpool status command. This hung after the first line of output. And I started getting messages on the console, saying things like (retyped; the system never really unhung, and isn't on a network yet anyway): gda: Warning: /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (disk 3) Error for command: write sector Error level: informational gda: sense key: aborted command gda: vendor Gen-ATA error code: 0x3 illegible: ata-disk start: select failed Eventually I have to hard-reset the box. It comes up again fine, and the pool is okay (I pushed the drive back in), and scrub doesn't find any errors. So what's going on? Does there have to be some special driver to communicate with the hot-swap hardware? I didn't think one was needed. Also, shouldn't some of these error messages end up in some kind of log file on disk somewhere? I found /var/log/syslog, and some other log files nearby, and none of them had any disk-related issues at all. Are those log files kept somewhere else entirely? -- David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/ RKBA: http://www.dd-b.net/carry/ Pics: http://www.dd-b.net/dd-b/SnapshotAlbum/ Dragaera/Steven Brust: http://dragaera.info/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320 - offtopic
Anton B. Rang wrote: JBOD probably isn't dead, simply because motherboard manufacturers are unlikely to pay the extra $10 it might cost to use a RAID-enabled chip rather than a plain chip (and the cost is more if you add cache RAM); but basic RAID is at least cheap. NVidia MCPs (later NForce chipsets) also do RAID. The NForce 5x0 systems even do RAID-5 and sparing (with 6 SATA ports). Using special-purpose RAID chips won't be necessary for desktops or low-end systems. Moore's law says that we can continue to integrate more and more functions onto fewer parts. Of course, having RAID in the HBA is a single point of failure! At this level, and price point, there are many SPOFs. Indeed there is always at least one SPOF. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss