Re: [zfs-discuss] ZFS Hard disk buffer at 100%
The drive (c7t2d0)is bad and should be replaced. The second drive (c7t5d0) is either bad or going bad. This is exactly the kind of problem that can force a Thumper to it knees, ZFS performance is horrific, and as soon as you drop the bad disks things magicly return to normal. My first recommendation is to pull the SMART data from the disks if you can. I wrote a blog entry about SMART to address exactly the behavior your seeing back in 2008: http://www.cuddletech.com/blog/pivot/entry.php?id=993 Yes, people will claim that SMART data is useless for predicting failures, but in a case like yours you are just looking for data to corroborate a hypothesis. In order to test this condition, zpool offline... c7t2d0, which emulated removal. See if performance improves. On Thumpers I'd build a list of suspect disks based on 'iostat', like you show, and then correlate the SMART data, and then systematically offline disks to see if it really was the problem. In my experience the only other reason you'll legitimately see really wierd bottoming out of IO like this is if you hit the max conncurrent IO limits in ZFS (untill recently that limit was 35), so you'd see actv=35, and then when the device finally processed the IO's the thing would snap back to life. But even in those cases you shouldn't see request times (asvc_t) rise above 200ms. All that to say, replace those disks or at least test it. SSD's won't help, one or more drives are toast. benr. On 5/8/10 9:30 PM, Emily Grettel wrote: Hi Giovani, Thanks for the reply. Here's a bit of iostat after uncompressing a 2.4Gb RAR file that has 1 DWF file that we use. extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.0 13.0 26.0 18.0 0.0 0.00.00.8 0 1 c7t1d0 2.05.0 77.0 12.0 2.4 1.0 343.8 142.8 100 100 c7t2d0 1.0 16.0 25.5 15.5 0.0 0.00.00.3 0 0 c7t3d0 0.0 10.00.0 17.0 0.0 0.03.21.2 1 1 c7t4d0 1.0 12.0 25.5 15.5 0.4 0.1 32.4 10.9 14 14 c7t5d0 1.0 15.0 25.5 18.0 0.0 0.00.10.1 0 0 c0t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 2.0 1.00.00.0 100 100 c7t2d0 1.00.00.50.0 0.0 0.00.00.1 0 0 c7t0d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 5.0 15.0 128.0 18.0 0.0 0.00.01.8 0 3 c7t1d0 1.09.0 25.5 18.0 2.0 1.8 199.7 179.4 100 100 c7t2d0 3.0 13.0 102.5 14.5 0.0 0.10.05.2 0 5 c7t3d0 3.0 11.0 102.0 16.5 0.0 0.12.34.2 1 6 c7t4d0 1.04.0 25.52.0 0.4 0.8 71.3 158.9 12 79 c7t5d0 5.0 16.0 128.5 19.0 0.0 0.10.12.6 0 5 c0t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.04.00.02.0 2.0 2.0 496.1 498.0 99 100 c7t2d0 0.00.00.00.0 0.0 1.00.00.0 0 100 c7t5d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 7.00.0 204.50.0 0.0 0.00.00.2 0 0 c7t1d0 1.00.0 25.50.0 3.0 1.0 2961.6 1000.0 99 100 c7t2d0 8.00.0 282.00.0 0.0 0.00.00.3 0 0 c7t3d0 6.00.0 282.50.0 0.0 0.06.12.3 1 1 c7t4d0 0.03.00.05.0 0.5 1.0 165.4 333.3 18 100 c7t5d0 7.00.0 204.50.0 0.0 0.00.01.6 0 1 c0t1d0 2.02.0 89.0 12.0 0.0 0.03.16.1 1 2 c3t0d0 0.02.00.0 12.0 0.0 0.00.00.2 0 0 c3t1d0 Sometimes two or more disks are going at 100. How does one solve this issue if its a firmware bug? I tried looking around for Western Digital Firmware for WD10EADS but couldn't find any available. Would adding an SSD or two help here? Thanks, Em Date: Fri, 7 May 2010 14:38:25 -0300 Subject: Re: [zfs-discuss] ZFS Hard disk buffer at 100% From: gtirl...@sysdroid.com To: emilygrettelis...@hotmail.com CC: zfs-discuss@opensolaris.org On Fri, May 7, 2010 at 8:07 AM, Emily Grettel emilygrettelis...@hotmail.com mailto:emilygrettelis...@hotmail.com wrote: Hi, I've had my RAIDz volume working well on SNV_131 but it has come to my attention that there has been some read issues with the drives. Previously I thought this was a CIFS problem but I'm noticing that when transfering files or uncompressing some fairly large 7z (1-2Gb) files (or even smaller rar - 200-300Mb) files occasionally running iostat will give the b% as 100 for a drive or two. That's
Re: [zfs-discuss] ZFS Hard disk buffer at 100%
Hi Ben, The drive (c7t2d0)is bad and should be replaced. The second drive (c7t5d0) is either bad or going bad. Dagnabbit. I'm glad you told me this, but I would have thought that running a scrub would have alerted me to some fault? and as soon as you drop the bad disks things magicly return to normal. Being a raidz, is it OK for me to actually do zpool offline for one drive without degrading the entire pool? I'm wondering whether I should keep using the WD10EADS or ask the business to invest in the black versions. I was thinking of the WD1002FAEX (which is SATA-III but my cards only do SATA-II) which seems to be better accomodated for NAS's. What are other peoples thoughts on this? Here's my current layout - 1,2 3 are 320Gb drives. 0. c0t1d0 ATA-WDC WD10EADS-00P-0A01-931.51GB /p...@0,0/pci1002,5...@4/pci1458,b...@0/d...@1,0 4. c7t1d0 ATA-WDC WD10EADS-00L-1A01-931.51GB /p...@0,0/pci1458,b...@11/d...@1,0 5. c7t2d0 ATA-WDC WD10EADS-00P-0A01-931.51GB /p...@0,0/pci1458,b...@11/d...@2,0 6. c7t3d0 ATA-WDC WD10EADS-00P-0A01-931.51GB /p...@0,0/pci1458,b...@11/d...@3,0 7. c7t4d0 ATA-WDC WD10EADS-00P-0A01-931.51GB /p...@0,0/pci1458,b...@11/d...@4,0 8. c7t5d0 ATA-WDC WD10EADS-00P-0A01-931.51GB /p...@0,0/pci1458,b...@11/d...@5,0 The other thing I was thinking of redoing the way the pool was setup, instead of a straight raidz layout, adopting a stripe and mirror? so 3 disks in RAID-0, then mirro them to the other three? http://www.cuddletech.com/blog/pivot/entry.php?id=993 Great blog entry! Unfortunately the SUNWhd package isn't available in the repo and I haven't been able to locate a similar SMART reader :( But your explanations are very valuable. In my experience the only other reason you'll legitimately see really wierd bottoming out of IO like this is if you hit the max conncurrent IO limits in ZFS (untill recently that limit was 35), so you'd see actv=35, and then when the device finally processed the IO's the thing would snap back to life. But even in those cases you shouldn't see request times (asvc_t) rise above 200ms. Hmmm, I did remember another admin tweaking the zfs configuration. Are these to blame by chance: /etc/system set pcplusmp:apic_intr_policy=1 set zfs:zfs_txg_synctime=1 set zfs:zfs_vdev_max_pending=10 I've tried to avoid tweaking anything in the ZFS configuration in fear it may give worse performance. All that to say, replace those disks or at least test it. SSD's won't help, one or more drives are toast. Thanks mate, I really appreciate some backing about this :-) Cheers, Em _ Need a new place to live? Find it on Domain.com.au http://clk.atdmt.com/NMN/go/157631292/direct/01/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
- Bob Friesenhahn bfrie...@simple.dallas.tx.us skrev: On Sat, 8 May 2010, Edward Ned Harvey wrote: A vast majority of the time, the opposite is true. Most of the time, having swap available increases performance. Because the kernel is able to choose: Should I swap out this idle process, or should I dump files out of cache? With swap enabled, the kernel is given another degree of freedom, to choose which is colder: idle process memory, or cold cached files. Are you sure about this? It is always good to be sure ... This is the case with most OSes now. Swap out stuff early, perhaps keep it in RAM and swap at the same time, and the kernel can choose what to do later. In Linux you can set it in /proc/sys/vm/swappiness. Anyone that knows how this is tuned in osol, btw? Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 -- cfgadm won't create attach point (dsk/xxxx)
- Giovanni giof...@gmail.com skrev: Hi, Were you ever able to solve this problem on your AOC-SAT2-MV8 card? I am in need of purchasing it to add more drives to my server. What problem was this? I have two servers with these cards and the work well Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] On Sat, 8 May 2010, Edward Ned Harvey wrote: A vast majority of the time, the opposite is true. Most of the time, having swap available increases performance. Because the kernel is able to choose: Should I swap out this idle process, or should I dump files out of cache? With swap enabled, the kernel is given another degree of freedom, to choose which is colder: idle process memory, or cold cached files. Are you sure about this? It is always good to be sure ... Hehheheeh ... I am sure of it in Linux. I am only assuming solaris/opensolaris are as good. So I could be wrong. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] On Sat, 8 May 2010, Edward Ned Harvey wrote: A vast majority of the time, the opposite is true. Most of the time, having swap available increases performance. Because the kernel is able to choose: Should I swap out this idle process, or should I dump files out of cache? With swap enabled, the kernel is given another degree of freedom, to choose which is colder: idle process memory, or cold cached files. Are you sure about this? It is always good to be sure ... This is the easiest way I know to show this in Linux: After the machine has been on, and doing things for a while (maybe hours, maybe days) run top or free. It is natural for the free to decrease to near-zero, while the buffers and cached climb to huge numbers. The buffers and cache are memory allocated to the kernel. It is also normal to see plenty of free, plenty of buffers and cache, and then see the swap usage increase to something nonzero. This is evidence that the Linux kernel is sometimes choosing to swap out idle processes, instead of dropping the buffers or cache usage. I don't really know how to do the same in solaris/opensolaris, but I haven't tried either. I only know that top doesn't show the same info in solaris 10. And then I moved on. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS loses configuration
I'm answering my own question, having just decided to try it. Yes, anything you want to persist beyond reboot with EON that's not in the zfs pools has to have an image update done before shutdown. I had this Doh! moment after I did the trial. Of course all the system config has to be on the system directoriess - which exist only in the boot image for EON. This realization let me fix quite a number of things which I was doing wrong. But it was not obvious to me as a raw beginner at EON. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hanging
Additionally, I would like to mention that the only ZFS filesystem not mounting -- causing the entire zpool import backup command to hang, is the only filesystem configured to be exported via NFS: backup/insightiq sharenfs root=* local Is there any chance the NFS share is the culprit here? If so, how to avoid it? Thanks, Eduardo Bragatto ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
On Sun, 9 May 2010, Roy Sigurd Karlsbakk wrote: Are you sure about this? It is always good to be sure ... This is the case with most OSes now. Swap out stuff early, perhaps keep it in RAM and swap at the same time, and the kernel can choose what to do later. In Linux you can set it in /proc/sys/vm/swappiness. Anyone that knows how this is tuned in osol, btw? While this is the zfs-discuss list, usually we are talking about Solaris/OpenSolaris here rather than most OSes. No? Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored Servers
On May 8, 2010, at 7:01 PM, Tony wrote: Ok, this is definitely the kind of feedback I was looking for. I'll have to check out the docs on these technologies it looks like. Appreciate it. I figured I would load balance the hosts with a Cisco device, since I can get around the IOS ok. I want to offer a online backup service that provides high availability. Yep, for this sort of business, a replication scheme works well. Note that the replication is better when it is done closer to the application. For a cloud storage company, it is relatively easy to modify the app to perform the redundancy and keep the storage simple. By contrast, if you are running a legacy app that you have no control over, you are stuck with their architecture and adding redundancy lower in the software stack is a necessary evil. Check out the following concept : http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ Syncing a box of this size could take a week or two. Consider triple redundancy. I like their basic idea, but they are in the market of $5/month unlimited. I want to take this to the next level. While this solution provides fault tolerance for drive failures, it does not seem to have a safeguard for if terrorists bomb one of your servers. So I figured ZFS would take care of the soft RAID, encryption (for compliance), dedup, and that kind of neat stuff. Yep. I'm still not 100% on how we're going to give access to the storage. I had thought about using RSync or DFS/RDC for Win hosts. VPN for encrypted transfer if needed. Would avoid CIFS probably. I guess a couple minute delay in the replication is ok, if the backup management software is smart enough to say wtf get over it, and retransmit the file again without user intervention. If you put redundancy in the infeed, then you won't have a delay like this. -- richard -- ZFS storage and performance consulting at http://www.RichardElling.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How can I be sure the zfs send | zfs received is correct?
Okay, so after some test with dedup on snv_134. I decided we can not to use dedup feature for the time being. While unable to destroy a dedupped file system. I decided to migrate the file system to another pool then destroy the pool. (see below) http://opensolaris.org/jive/thread.jspa?threadID=128532tstart=75 http://opensolaris.org/jive/thread.jspa?threadID=128620tstart=60 Now here is my problem. I did a snapshot of the file system I want to migrate. I did a send and receive of the file system zfs send tank/export/projects/project1...@today | zfs receive -d mpool but the file system end up smaller than the original file system without the dedup turn on. How is this possible? Can someone explain. I am not able to trust the data now until I can verify the data are identical. SunOS filearch1 5.11 snv_134 i86pc i386 i86xpv Solaris r...@filearch1:/var/adm# zpool status pool: mpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM mpool ONLINE 0 0 0 c7t7d0ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c7t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 errors: No known data errors r...@filearch1:/var/adm# zfs list NAME USED AVAIL REFER MOUNTPOINT mpool 407G 278G22K /mpool mpool/export 407G 278G22K /mpool/export mpool/export/projects 407G 278G23K /mpool/export/projects mpool/export/projects/bali_nobackup407G 278G 407G /mpool/export/projects/project1_nb ... tank 520G 4.11T 34.9K /tank tank/export/projects 515G 4.11T 41.5K /export/projects tank/export/projects/bali_nobackup 427G 4.11T 424G /export/projects/project1_nb r...@filearch1:/var/adm# zfs get compressratio NAME PROPERTY VALUE SOURCE mpool compressratio 2.43x - mpool/export compressratio 2.43x - mpool/export/projects compressratio 2.43x - mpool/export/projects/project1_nbcompressratio 2.43x - mpool/export/projects/project1...@today compressratio 2.43x - tank compressratio 2.34x - tank/exportcompressratio 2.34x - tank/export/projects compressratio 2.34x - tank/export/projects/project1_nb compressratio 2.44x - tank/export/projects/project1...@today compressratio 2.44x - tank/export/projects/project1_nb_2 compressratio 1.00x - tank/export/projects/project1_nb_3 compressratio 1.90x - r...@filearch1:/var/adm# zpool list NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT mpool 696G 407G 289G58% 1.00x ONLINE - rpool 19.9G 9.50G 10.4G47% 1.00x ONLINE - tank 5.44T 403G 5.04T 7% 2.53x ONLINE - -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
On May 9, 2010, at 6:30 AM, Roy Sigurd Karlsbakk wrote: - Bob Friesenhahn bfrie...@simple.dallas.tx.us skrev: On Sat, 8 May 2010, Edward Ned Harvey wrote: A vast majority of the time, the opposite is true. Most of the time, having swap available increases performance. Because the kernel is able to choose: Should I swap out this idle process, or should I dump files out of cache? With swap enabled, the kernel is given another degree of freedom, to choose which is colder: idle process memory, or cold cached files. Are you sure about this? It is always good to be sure ... This is the case with most OSes now. Swap out stuff early, perhaps keep it in RAM and swap at the same time, and the kernel can choose what to do later. In Linux you can set it in /proc/sys/vm/swappiness. Anyone that knows how this is tuned in osol, btw? This is a better question for perf-discuss. For a storage server, swap is not needed. If you notice swap being used then your storage server is undersized. -- richard -- ZFS storage and performance consulting at http://www.RichardElling.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How can I be sure the zfs send | zfs received is correct?
On May 9, 2010, at 11:16 AM, Jim Horng wrote: Okay, so after some test with dedup on snv_134. I decided we can not to use dedup feature for the time being. While unable to destroy a dedupped file system. I decided to migrate the file system to another pool then destroy the pool. (see below) http://opensolaris.org/jive/thread.jspa?threadID=128532tstart=75 http://opensolaris.org/jive/thread.jspa?threadID=128620tstart=60 Now here is my problem. I did a snapshot of the file system I want to migrate. I did a send and receive of the file system zfs send tank/export/projects/project1...@today | zfs receive -d mpool but the file system end up smaller than the original file system without the dedup turn on. How is this possible? What you think you are measuring is not what you are measuring. Compare the size of the snapshots. -- richard Can someone explain. I am not able to trust the data now until I can verify the data are identical. SunOS filearch1 5.11 snv_134 i86pc i386 i86xpv Solaris r...@filearch1:/var/adm# zpool status pool: mpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM mpool ONLINE 0 0 0 c7t7d0ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c7t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 errors: No known data errors r...@filearch1:/var/adm# zfs list NAME USED AVAIL REFER MOUNTPOINT mpool 407G 278G22K /mpool mpool/export 407G 278G22K /mpool/export mpool/export/projects 407G 278G23K /mpool/export/projects mpool/export/projects/bali_nobackup407G 278G 407G /mpool/export/projects/project1_nb ... tank 520G 4.11T 34.9K /tank tank/export/projects 515G 4.11T 41.5K /export/projects tank/export/projects/bali_nobackup 427G 4.11T 424G /export/projects/project1_nb r...@filearch1:/var/adm# zfs get compressratio NAME PROPERTY VALUE SOURCE mpool compressratio 2.43x - mpool/export compressratio 2.43x - mpool/export/projects compressratio 2.43x - mpool/export/projects/project1_nbcompressratio 2.43x - mpool/export/projects/project1...@today compressratio 2.43x - tank compressratio 2.34x - tank/exportcompressratio 2.34x - tank/export/projects compressratio 2.34x - tank/export/projects/project1_nb compressratio 2.44x - tank/export/projects/project1...@today compressratio 2.44x - tank/export/projects/project1_nb_2 compressratio 1.00x - tank/export/projects/project1_nb_3 compressratio 1.90x - r...@filearch1:/var/adm# zpool list NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT mpool 696G 407G 289G58% 1.00x ONLINE - rpool 19.9G 9.50G 10.4G47% 1.00x ONLINE - tank 5.44T 403G 5.04T 7% 2.53x ONLINE - -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- ZFS storage and performance consulting at http://www.RichardElling.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How can I be sure the zfs send | zfs received is correct?
size of snapshot? r...@filearch1:/var/adm# zfs list mpool/export/projects/project1...@today NAMEUSED AVAIL REFER MOUNTPOINT mpool/export/projects/project1...@today 0 - 407G - r...@filearch1:/var/adm# zfs list tank/export/projects/project1...@today NAME USED AVAIL REFER MOUNTPOINT tank/export/projects/project1...@today 2.44G - 424G - -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Disk Drive Qualification
Hello, I see strange behaviour when qualifying disk drives for ZFS. The tests I want to run should make sure that the drives honour the cache flush command. For this I do the following: 1) Create singe disk pools (only one disk in the pool) 2) Perorm I/O on the pools This is done via SQLIte and transactions. As soon as the transaction is commited to the calling application, I record the number of the transaction (a increasing number). 3) Then I pull the disk 4) I remember the number last committed 5) I power off the server 6) I plug the disk back 7) I power on the server 8) I verify that the last commited number is on disk Here I find that this fails always by 1 transaction. One transaction is committed to the appliance but not on disk. The strange thing is that if I only wait 10 seconds after the disk pull and do the reoot then, then all transactions are on disk. For me it looks like the I/O in flight is commited altought it never reaches the disk. Any tipps how to investigate further ? Regards, Robert -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
I know that according to the documentation Solaris is supposed to be fully operational in the absences of swap devices. However, I've experienced cases which I have not been able to trace the root cause of yet where the disk access has increased drastically and caused the system to hang but it may be more of a performance issue. One concern is that I have applications that create a lot of /tmp files and they may end up consuming all RAM. I assume /tmp files cannot be swapped out to give room for new processes without a swap device so the malloc failures in the applications will come much sooner. I wonder if cached files or process pages have the highest priority of not being swapped out in the Solaris swap policy? /Karl D -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Hard disk buffer at 100%
On Sat, May 8 at 23:39, Ben Rockwood wrote: The drive (c7t2d0)is bad and should be replaced. The second drive (c7t5d0) is either bad or going bad. This is exactly the kind of problem that can force a Thumper to it knees, ZFS performance is horrific, and as soon as you drop the bad disks things magicly return to normal. Problem is the OP is mixing client 4k drives with 512b drives. They may not actually be bad, but they appear to be getting misused in this application. I doubt they're broken per say, they're just dramatically slower than their peers in this workload. As a replacement recommendation, we've been beating on the WD 1TB RE3 drives for 18 months or so, and we're happy with both performance and the price for what we get. $160/ea with a 5 year warranty. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
On Sun, May 9, 2010 at 7:40 PM, Edward Ned Harvey solar...@nedharvey.com wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Richard Elling For a storage server, swap is not needed. If you notice swap being used then your storage server is undersized. Indeed, I have two solaris 10 fileservers that have uptime in the range of a few months. I just checked swap usage, and they're both zero. So, Bob, rub it in if you wish. ;-) I was wrong. I knew the behavior in Linux, which Roy seconded as most OSes, and apparently we both assumed the same here, but that was wrong. I don't know if solaris and opensolaris both have the same swap behavior. I don't know if there's *ever* a situation where solaris/opensolaris would swap idle processes. But there's at least evidence that my two servers have not, or do not. If Solaris is under memory pressure, pages may be paged to swap. Under severe memory pressure, entire processes may be swapped. This will happen after freeing up the memory used for file system buffers, ARC, etc. If the processes never page in the pages that have been paged out (or the processes that have been swapped out are never scheduled) then those pages will not consume RAM. The best thing to do with processes that can be swapped out forever is to not run them. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
On Sun, May 09, 2010 at 09:24:38PM -0500, Mike Gerdts wrote: The best thing to do with processes that can be swapped out forever is to not run them. Agreed, however: #1 Shorter values of forever (like, say, daily) may still be useful. #2 This relies on knowing in advance what these processes will be. #3 Where are the JeOS builds without all the gnome-infested likely suspects? -- Dan. pgpHYkrXDUgqQ.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
On Sun, 9 May 2010, Edward Ned Harvey wrote: So, Bob, rub it in if you wish. ;-) I was wrong. I knew the behavior in Linux, which Roy seconded as most OSes, and apparently we both assumed the same here, but that was wrong. I don't know if solaris and opensolaris both have the same swap behavior. I don't know if there's *ever* a situation where solaris/opensolaris would swap idle processes. But there's at least evidence that my two servers have not, or do not. Solaris and Linux are different in many ways since they are completely different operating systems. Solaris 2.X has never swapped processes. It only sends dirty pages to the paging device if there is a shortage of pages when more are requested, or if there are not enough free, but first it will purge seldom accessed read-only pages which can easily be restored. Zfs has changed things up again by not caching file data via the unified page cache and using a specialized ARC instead. It seems that simple paging and MMU control was found not to be smart enough. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and Comstar iSCSI BLK size
I am using ZFS as the backing store for an iscsi target running a virtual machine. I am looking at using 8K block size on the zfs volume. I was looking at the comstar iscsi settings and there is also a blk size configuration, which defaults as 512 bytes. That would make me believe that all of the IO will be broken down into 512 bytes which seems very inefficient. It seems this value should match the file system allocation/cluster size in the VM, maybe 4K if you are using an ntfs file system. Does anyone have any input on this? Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss