Re: [zfs-discuss] size of slog device
On 06/14/10 19:35, Erik Trimble wrote: On 6/14/2010 12:10 PM, Neil Perrin wrote: On 06/14/10 12:29, Bob Friesenhahn wrote: On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote: It is good to keep in mind that only small writes go to the dedicated slog. Large writes to to main store. A succession of that many small writes (to fill RAM/2) is highly unlikely. Also, that the zil is not read back unless the system is improperly shut down. I thought all sync writes, meaning everything NFS and iSCSI, went into the slog - IIRC the docs says so. Check a month or two back in the archives for a post by Matt Ahrens. It seems that larger writes (>32k?) are written directly to main store. This is probably a change from the original zfs design. Bob If there's a slog then the data, regardless of size, gets written to the slog. If there's no slog and if the data size is greater than zfs_immediate_write_sz/zvol_immediate_write_sz (both default to 32K) then the data is written as a block into the pool and the block pointer written into the log record. This is the WR_INDIRECT write type. So Matt and Roy are both correct. But wait, there's more complexity!: If logbias=throughput is set we always use WR_INDIRECT. If we just wrote more than 1MB for a single zil commit and there's more than 2MB waiting then we start using the main pool. Clear as mud? This is likely to change again... Neil. How do I monitor the amount of live (i.e. non-committed) data in the slog? I'd like to spend some time with my setup, seeing exactly how much I tend to use. I think monitoring the capacity when running "zpool iostat -v 1" should be fairly accurate. A simple d script can be written to determine how often the ZIL (code) fails to get a slog block and has to resort to the allocation in the main pool. One recent change reduced the amount of data written and possibly the slog block fragmentation. This is zpool version 23: "Slim ZIL". So be sure to experiment with that. I'd suspect that very few use cases call for more than a couple (2-4) GB of slog... I agree this is typically true. Of course it depends on your workload. The amount slog data will reflect the uncommitted synchronous txg data, and the size of each txg will depend on memory size. This area is also undergoing tuning. I'm trying to get hard numbers as I'm working on building a DRAM/battery/flash slog device in one of my friend's electronics prototyping shops. It would be really nice if I could solve 99% of the need with 1 or 2 2GB SODIMMs and the chips from a cheap 4GB USB thumb drive... Sounds like fun. Good luck. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] size of slog device
On Jun 14, 2010, at 6:35 PM, Erik Trimble wrote: > On 6/14/2010 12:10 PM, Neil Perrin wrote: >> On 06/14/10 12:29, Bob Friesenhahn wrote: >>> On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote: >>> > It is good to keep in mind that only small writes go to the dedicated > slog. Large writes to to main store. A succession of that many small > writes (to fill RAM/2) is highly unlikely. Also, that the zil is not > read back unless the system is improperly shut down. I thought all sync writes, meaning everything NFS and iSCSI, went into the slog - IIRC the docs says so. >>> >>> Check a month or two back in the archives for a post by Matt Ahrens. It >>> seems that larger writes (>32k?) are written directly to main store. This >>> is probably a change from the original zfs design. >>> >>> Bob >> >> If there's a slog then the data, regardless of size, gets written to the >> slog. >> >> If there's no slog and if the data size is greater than >> zfs_immediate_write_sz/zvol_immediate_write_sz >> (both default to 32K) then the data is written as a block into the pool and >> the block pointer >> written into the log record. This is the WR_INDIRECT write type. >> >> So Matt and Roy are both correct. >> >> But wait, there's more complexity!: >> >> If logbias=throughput is set we always use WR_INDIRECT. >> >> If we just wrote more than 1MB for a single zil commit and there's more than >> 2MB waiting >> then we start using the main pool. >> >> Clear as mud? This is likely to change again... >> >> Neil. >> > > How do I monitor the amount of live (i.e. non-committed) data in the slog? > I'd like to spend some time with my setup, seeing exactly how much I tend to > use. zilstat http://www.richardelling.com/Home/scripts-and-programs-1/zilstat > I'd suspect that very few use cases call for more than a couple (2-4) GB of > slog... I'd suspect few real cases need more than 1GB. -- richard -- Richard Elling rich...@nexenta.com +1-760-896-4422 ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub issues
Richard Elling wrote: On Jun 14, 2010, at 2:12 PM, Roy Sigurd Karlsbakk wrote: Hi all It seems zfs scrub is taking a big bit out of I/O when running. During a scrub, sync I/O, such as NFS and iSCSI is mostly useless. Attaching an SLOG and some L2ARC helps this, but still, the problem remains in that the scrub is given full priority. Scrub always runs at the lowest priority. However, priority scheduling only works before the I/Os enter the disk queue. If you are running Solaris 10 or older releases with HDD JBODs, then the default zfs_vdev_max_pending is 35. This means that your slow disk will have 35 I/Os queued to it before priority scheduling makes any difference. Since it is a slow disk, that could mean 250 to 1500 ms before the high priority I/O reaches the disk. Is this problem known to the developers? Will it be addressed? In later OpenSolaris releases, the zfs_vdev_max_pending defaults to 10 which helps. You can tune it lower as described in the Evil Tuning Guide. Also, as Robert pointed out, CR 6494473 offers a more resource management friendly way to limit scrub traffic (b143). Everyone can buy George a beer for implementing this change :-) I'll glad accept any beer donations and others on the ZFS team are happy to help consume it. :-) I look forward to hearing people's experience with the new changes. - George Of course, this could mean that on a busy system a scrub that formerly took a week might now take a month. And the fix does not directly address the tuning of the queue depth issue with HDDs. TANSTAAFL. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] size of slog device
On 6/14/2010 12:10 PM, Neil Perrin wrote: On 06/14/10 12:29, Bob Friesenhahn wrote: On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote: It is good to keep in mind that only small writes go to the dedicated slog. Large writes to to main store. A succession of that many small writes (to fill RAM/2) is highly unlikely. Also, that the zil is not read back unless the system is improperly shut down. I thought all sync writes, meaning everything NFS and iSCSI, went into the slog - IIRC the docs says so. Check a month or two back in the archives for a post by Matt Ahrens. It seems that larger writes (>32k?) are written directly to main store. This is probably a change from the original zfs design. Bob If there's a slog then the data, regardless of size, gets written to the slog. If there's no slog and if the data size is greater than zfs_immediate_write_sz/zvol_immediate_write_sz (both default to 32K) then the data is written as a block into the pool and the block pointer written into the log record. This is the WR_INDIRECT write type. So Matt and Roy are both correct. But wait, there's more complexity!: If logbias=throughput is set we always use WR_INDIRECT. If we just wrote more than 1MB for a single zil commit and there's more than 2MB waiting then we start using the main pool. Clear as mud? This is likely to change again... Neil. How do I monitor the amount of live (i.e. non-committed) data in the slog? I'd like to spend some time with my setup, seeing exactly how much I tend to use. I'd suspect that very few use cases call for more than a couple (2-4) GB of slog... I'm trying to get hard numbers as I'm working on building a DRAM/battery/flash slog device in one of my friend's electronics prototyping shops. It would be really nice if I could solve 99% of the need with 1 or 2 2GB SODIMMs and the chips from a cheap 4GB USB thumb drive... -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Native ZFS for Linux
On 2010-Jun-11 17:41:38 +0800, Joerg Schilling wrote: >PP.S.: Did you know that FreeBSD _includes_ the GPLd Reiserfs in the FreeBSD >kernel since a while and that nobody did complain about this, see e.g.: > >http://svn.freebsd.org/base/stable/8/sys/gnu/fs/reiserfs/ That is completely irrelevant and somewhat misleading. FreeBSD has never prohibited non-BSD-licensed code in their kernel or userland however it has always been optional and, AFAIR, the GENERIC kernel has always defaulted to only contain BSD code. Non-BSD code (whether GPL or CDDL) is carefully segregated (note the 'gnu' in the above URI). -- Peter Jeremy pgpvmgKqx7nJf.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub issues
On Jun 14, 2010, at 2:12 PM, Roy Sigurd Karlsbakk wrote: > Hi all > > It seems zfs scrub is taking a big bit out of I/O when running. During a > scrub, sync I/O, such as NFS and iSCSI is mostly useless. Attaching an SLOG > and some L2ARC helps this, but still, the problem remains in that the scrub > is given full priority. Scrub always runs at the lowest priority. However, priority scheduling only works before the I/Os enter the disk queue. If you are running Solaris 10 or older releases with HDD JBODs, then the default zfs_vdev_max_pending is 35. This means that your slow disk will have 35 I/Os queued to it before priority scheduling makes any difference. Since it is a slow disk, that could mean 250 to 1500 ms before the high priority I/O reaches the disk. > Is this problem known to the developers? Will it be addressed? In later OpenSolaris releases, the zfs_vdev_max_pending defaults to 10 which helps. You can tune it lower as described in the Evil Tuning Guide. Also, as Robert pointed out, CR 6494473 offers a more resource management friendly way to limit scrub traffic (b143). Everyone can buy George a beer for implementing this change :-) Of course, this could mean that on a busy system a scrub that formerly took a week might now take a month. And the fix does not directly address the tuning of the queue depth issue with HDDs. TANSTAAFL. -- richard -- Richard Elling rich...@nexenta.com +1-760-896-4422 ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Data Loss on system crash/upgrade
> Hello all, > > I've been running OpenSolaris on my personal > fileserver for about a year and a half, and it's been > rock solid except for having to upgrade from 2009.06 > to a dev version to fix some network driver issues. > About a month ago, the motherboard on this computer > died, and I upgraded to a better motherboard and > processor. This move broke the OS install, and > instead of bothering to try to figure out how to fix > it, I decided on a reinstall. All my important data > (including all my virtual hard drives) are stored on > a separate 3 disk raidz pool. > > In attempting to import the pool, I realized that I > had upgraded the zpool to a newer version than is > supported in the live CD, so I installed the latest > dev release to allow the filesystem to mount. After > mounting the drives (with a zpool import -f), I > noticed that some files might be missing. After > installing virtualbox and booting up a WinXP VM, this > issue was confirmed. > > Files before 2/10/2010 seem to be unharmed, but the > next file I have logged on 2/19/2010 is missing. > Every file created after this date is also missing. > The machine had been rebooted several times before > the crash with no issues. For the week or so prior > to the machine finally dying for good, it would > boot, last a few hours, and then crash. These files > were fine during that period. > > One more thing of note: when the machine suffered > critical hardware failure, the zpool in issue was at > about 95% full. When I upgraded to new hardware > (after updating the machine), I added two mirrored > disks to the pool to alleviate the space issue until > I could back everything up, destroy the pool, and > recreate it with six disks instead of three. > > Is this a known bug with a fix, or am I out of luck > with these files? > > Thanks, > Austin Austin, If your raidz pool with important data was damaged in some way by the hardware failures, then a recovery mechanism in recent builds is to discard the last few transactions to get the pool back to a good known state. You would have seen messages regarding this recovery. We might be able to see if this recovery happened if you could provide your zpool history output for this pool. Generally, a few seconds of data transactions are lost, not all of the data after a certain date. Another issue is that VirtualBox doesn't honor cache flushes by default so if the system is crashing with data in play, your data might not be safely written to disk. Thanks, Cindy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub issues
On 14/06/2010 22:12, Roy Sigurd Karlsbakk wrote: Hi all It seems zfs scrub is taking a big bit out of I/O when running. During a scrub, sync I/O, such as NFS and iSCSI is mostly useless. Attaching an SLOG and some L2ARC helps this, but still, the problem remains in that the scrub is given full priority. Is this problem known to the developers? Will it be addressed? http://sparcv9.blogspot.com/2010/06/slower-zfs-scrubsresilver-on-way.html http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473 -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Scrub issues
Hi all It seems zfs scrub is taking a big bit out of I/O when running. During a scrub, sync I/O, such as NFS and iSCSI is mostly useless. Attaching an SLOG and some L2ARC helps this, but still, the problem remains in that the scrub is given full priority. Is this problem known to the developers? Will it be addressed? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] COMSTAR dropouts with dedup enabled
On Mon, Jun 14, 2010 at 1:35 PM, Brandon High wrote: > How much memory do you have, and how big is the DDT? You can get the > DDT size with 'zdb -DD'. The total count is the sum of duplicate and > unique entries. Each entry uses ~ 250 bytes per entry, so the count > divided by 4 is a (very rough) estimate of the memory size of the DDT > in kilobytes. One more thing: The default block size is 8k for zvols, which means that the DDT will grow much faster than for filesystem datasets. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] COMSTAR dropouts with dedup enabled
On Sun, Jun 13, 2010 at 6:58 PM, Matthew Anderson wrote: > The problem didn’t seem to occur with only a small amount of data on the LUN > (<50GB) and happened more frequently as the LUN filled up. I’ve since moved > all data to non-dedup LUN’s and I haven’t seen a dropout for over a month. How much memory do you have, and how big is the DDT? You can get the DDT size with 'zdb -DD'. The total count is the sum of duplicate and unique entries. Each entry uses ~ 250 bytes per entry, so the count divided by 4 is a (very rough) estimate of the memory size of the DDT in kilobytes. The most likely case is that you don't have enough memory to hold the entire dedup table in the ARC. When this happens, the host has to read entries from the main pool, which is slow. If you want to continue running with dedup, adding a L2ARC may help since the DDT can be held in the faster cache. Disabling dedup for the dataset will give you good write performance too. Be aware that destroying snapshots from this dataset (or destroying the dataset itself) is likely to create dropouts as well, since the DDT needs to be scanned to see if a block can be dereferenced. Again, adding L2ARC may help. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] size of slog device
On 06/14/10 12:29, Bob Friesenhahn wrote: On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote: It is good to keep in mind that only small writes go to the dedicated slog. Large writes to to main store. A succession of that many small writes (to fill RAM/2) is highly unlikely. Also, that the zil is not read back unless the system is improperly shut down. I thought all sync writes, meaning everything NFS and iSCSI, went into the slog - IIRC the docs says so. Check a month or two back in the archives for a post by Matt Ahrens. It seems that larger writes (>32k?) are written directly to main store. This is probably a change from the original zfs design. Bob If there's a slog then the data, regardless of size, gets written to the slog. If there's no slog and if the data size is greater than zfs_immediate_write_sz/zvol_immediate_write_sz (both default to 32K) then the data is written as a block into the pool and the block pointer written into the log record. This is the WR_INDIRECT write type. So Matt and Roy are both correct. But wait, there's more complexity!: If logbias=throughput is set we always use WR_INDIRECT. If we just wrote more than 1MB for a single zil commit and there's more than 2MB waiting then we start using the main pool. Clear as mud? This is likely to change again... Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sync Write - ZIL log performance - Feedback for ZFS developers?
On 04/10/10 09:28, Edward Ned Harvey wrote: - If synchronous writes are large (>32K) and block aligned then the blocks are written directly to the pool and a small record written to the log. Later when the txg commits then the blocks are just linked into the txg. However, this processing is not done if there are any slogs because I found it didn't perform as well. Probably ought to be re-evaluated. Won't this affect NFS/iSCSI performance pretty badly where the ZIL is crucial? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Permament errors in "files" <0x0>
I've been referred to here from the zfs-fuse newsgroup. I have a (non-redundant) pool which is reporting errors that I don't quite understand: # zpool status -v pool: green state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress for 1h12m, 2.96% done, 39h44m to go config: NAMESTATE READ WRITE CKSUM green ONLINE 0 0 2 disk/by-id/dm-name-green ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: :<0x0> green:<0x0> I read the explanations at http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.html#gbcuz that the 0x0 is output when a file path is not available, but I'm still unsure how to proceed (of course, I'd also like to know why these errors occurred in the first place - after just a couple of days of using zfs-fuse, but that's another story). It has been suggested to me to copy out all data from the pool and/or recreate it from backup, but do I really have to (hours of recovery), or is there a faster way to correct the problem? Apart from these alarming messages, the pool seems to be in working order, e.g. all files that I tried could be read. I guess I'd just like to know [i]what[/i] the corrupted data is and the implications. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] size of slog device
On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote: It is good to keep in mind that only small writes go to the dedicated slog. Large writes to to main store. A succession of that many small writes (to fill RAM/2) is highly unlikely. Also, that the zil is not read back unless the system is improperly shut down. I thought all sync writes, meaning everything NFS and iSCSI, went into the slog - IIRC the docs says so. Check a month or two back in the archives for a post by Matt Ahrens. It seems that larger writes (>32k?) are written directly to main store. This is probably a change from the original zfs design. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] COMSTAR dropouts with dedup enabled
Hi All, I currently use b134 and COMSTAR to deploy SRP targets for virtual machine storage (VMware ESXi4) and have run into some unusual behaviour when dedup is enabled for a particular LUN. The target seems to lock up (ESX reports it as unavailable) when writing large amount or overwriting data, reads are unaffected. The easiest way for me to replicate the problem was to restore a 2GB SQL database inside a VM. The dropouts lasted anywhere from 3 seconds to a few minutes and when connectivity is restored the other LUN's (without dedup) drop out for a few seconds. The problem didn't seem to occur with only a small amount of data on the LUN (<50GB) and happened more frequently as the LUN filled up. I've since moved all data to non-dedup LUN's and I haven't seen a dropout for over a month. Does anyone know why this is happening? I've also seen the behaviour when exporting iSCSI targets with COMSTAR. I haven't had a chance to install the SSD's for L2ARC and SLOG yet so I'm unsure if that will help the issue. System specs are- Single Xeon 5620 24GB DDR3 24x 1.5TB 7200rpm LSI RAID card Thanks -Matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] size of slog device
- Original Message - > On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote: > > >> There is absolutely no sense in having slog devices larger than > >> then main memory, because it will never be used, right? > >> ZFS will rather flush the txg to disk than reading back from > >> zil? So there is a guideline to have enough slog to hold about 10 > >> seconds of zil, but the absolute maximum value is the size of > >> main memory. Is this correct? > > > > ZFS uses at most RAM/2 for ZIL > > It is good to keep in mind that only small writes go to the dedicated > slog. Large writes to to main store. A succession of that many small > writes (to fill RAM/2) is highly unlikely. Also, that the zil is not > read back unless the system is improperly shut down. I thought all sync writes, meaning everything NFS and iSCSI, went into the slog - IIRC the docs says so. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Unable to Install 2009.06 on BigAdmin Approved MOBO - FILE SYSTEM FULL
Hi Giovanni, My Monday morning guess is that the disk/partition/slices are not optimal for the installation. Can you provide the partition table on the disk that you are attempting to install? Use format-->disk-->partition-->print. You want to put all the disk space in c*t*d*s0. See this section of the ZFS troubleshooting guide for an example of fixing the disk/partition/ slice issues: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide Replacing/Relabeling the Root Pool Disk Thanks, Cindy On 06/13/10 17:42, Giovanni wrote: Hi Guys I am having trouble installing Opensolaris 2009.06 into my Biostar Tpower I45 motherboard, approved on BigAdmin HCL here: http://www.sun.com/bigadmin/hcl/data/systems/details/26409.html -- why is it not working? My setup: 3x 1TB hard-drives SATA 1x 500GB hard-drive (I have only left this hdd connected to try to isolate the issue, still happens) 4GB DDR2 PC2-6400 Ram (tested GOOD!) ATI Radeon 4650 512MB DDR2 PCI-E 16x Motherboard default settings/CMOS cleared Here's what happens: Opensolaris boot options come up, I choose the first default "OPensolaris 2009.06" -- I HAVE ALSO TRIED VESA DRIVES and Command line, all of these fail. - After Select desktop language, configuring devices. Mounting cdroms Reading ZFS Config: done. opensolaris console login: (cd rom is still being accessed at this time).. few seconds later: then opensolaris ufs: NOTICE: alloc: /: file system full opensolaris last message repeated 1 time opensolaris syslogd: /var/adm/messages: No space left on device opensolaris in.routed[537]: route 0.0.0.0/8 -> 0.0.0.0 nexthop is not directly connected --- I logged in as jack / jack on the console and did a df -h /devices/ramdisk:a = size 164M 100% used mount / swap 3.3GB used 860K 1% /mnt/misc/opt 210MB used 210M 100% /mnt/misc /usr/lib/libc/libc_hwcap1.so.1 2.3G used 2.3G 100% /lib/libc.so.1 /dev/dsk/c7t0d0s2 677M used 677M 100% /media/OpenSolaris Thanks for any help! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] size of slog device
On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote: There is absolutely no sense in having slog devices larger than then main memory, because it will never be used, right? ZFS will rather flush the txg to disk than reading back from zil? So there is a guideline to have enough slog to hold about 10 seconds of zil, but the absolute maximum value is the size of main memory. Is this correct? ZFS uses at most RAM/2 for ZIL It is good to keep in mind that only small writes go to the dedicated slog. Large writes to to main store. A succession of that many small writes (to fill RAM/2) is highly unlikely. Also, that the zil is not read back unless the system is improperly shut down. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moved disks to new controller - cannot import pool even after moving ba
On Jun 13, 2010, at 2:14 PM, Jan Hellevik wrote: Well, for me it was a cure. Nothing else I tried got the pool back. As far as I can tell, the way to get it back should be to use symlinks to the fdisk partitions on my SSD, but that did not work for me. Using -V got the pool back. What is wrong with that? If you have a better suggestion as to how I should have recovered my pool I am certainly interested in hearing it. I would take this time to offline one disk at a time, wipe all it's tables/labels and re-attach it as an EFI whole disk to avoid hitting this same problem again in the future. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] size of slog device
Roy Sigurd Karlsbakk wrote: >> There is absolutely no sense in having slog devices larger than >> then main memory, because it will never be used, right? >> ZFS will rather flush the txg to disk than reading back from >> zil? So there is a guideline to have enough slog to hold about 10 >> seconds of zil, but the absolute maximum value is the size of >> main memory. Is this correct? > > ZFS uses at most RAM/2 for ZIL Thanks! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] size of slog device
Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Arne Jansen >> >> There is absolutely no sense in having slog devices larger than >> then main memory, because it will never be used, right? > > Also: A TXG is guaranteed to flush within 30 sec. Let's suppose you have a > super fast device, which is able to log 8Gbit/sec (which is unrealistic). > That's 1Gbyte/sec, unrealistically theoretically possible, at best. You do > the math. ;-) > > That being said, it's difficult to buy an SSD smaller than 32G. So what are > you going to do? I'm still building my rotational write delay eliminating driver and am trying to figure out how much space I can waste on the underlying device without ever running into problems. I need half the physical memory, or, under the assumption that it might be tunable, a maximum of my physical memory. It's good to know a hard upper limit. The more I can waste, the faster the device will be. Also, to stay in your line of argumentation, this super-fast slog is most probably a DRAM-based, battery backed solution. In this case it will make a difference if you buy 8 or 32GB ;) --Arne ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] size of slog device
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Arne Jansen > > There is absolutely no sense in having slog devices larger than > then main memory, because it will never be used, right? Also: A TXG is guaranteed to flush within 30 sec. Let's suppose you have a super fast device, which is able to log 8Gbit/sec (which is unrealistic). That's 1Gbyte/sec, unrealistically theoretically possible, at best. You do the math. ;-) That being said, it's difficult to buy an SSD smaller than 32G. So what are you going to do? Slice it and use the remaining space for cache? Some people do. Some people may even get a performance benefit by doing so. But if you do, now you've got a cache and a log both competing for IO on the same device. The performance benefit degrades for sure. My advice is to simply acknowledge wasted space in your log device, forget about it and move on. Same thing you did with all the wasted space on your mirrored OS boot device, which can't (or shouldn't) be used by your data pool. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup performance hit
>> > To add such a device, you would do: > 'zpool add tank mycachedevice' > > Hi Correct me if I'm wrong, but for me the good command should be : 'zpool add tank cache mycachedevice' If you don't use the "cache" keyword, the device would be added as a classical top level vdev. Remi * This message and any attachments (the "message") are confidential and intended solely for the addressees. Any unauthorised use or dissemination is prohibited. Messages are susceptible to alteration. France Telecom Group shall not be liable for the message if altered, changed or falsified. If you are not the intended addressee of this message, please cancel it immediately and inform the sender. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup performance hit
>> > You are severely RAM limited. In order to do dedup, ZFS has to maintain > a catalog of every single block it writes and the checksum for that > block. This is called the Dedup Table (DDT for short). > > So, during the copy, ZFS has to (a) read a block from the old > filesystem, (b) check the current DDT to see if that block exists and > (c) either write the block to the new filesytem (and add an appropriate > DDT entry for it), or write a metadata update with the dedup reference > block reference. > > Likely, you have two problems: > > (1) I suspect your source filesystem has lots of blocks (that is, it's > likely made up smaller-sized files). Lots of blocks means lots of > seeking back and forth to read all those blocks. > > (2) Lots of blocks also means lots of entries in the DDT. It's trivial > to overwhelm a 4GB system with a large DDT. If the DDT can't fit in > RAM, then it has to get partially refreshed from disk. > > Thus, here's what's likely going on: > > (1) ZFS reads a block and it's checksum from the old filesystem > (2) it checks the DDT to see if that checksum exists > (3) finding that the entire DDT isn't resident in RAM, it starts a cycle > to read the rest of the (potential) entries from the new filesystems' > metadata. That is, it tries to reconstruct the DDT from disk. Which > involves a HUGE amount of random seek reads on the new filesystem. > > In essence, since you likely can't fit the DDT in RAM, each block read > from the old filesystem forces a flurry of reads from the new > filesystem. Which eats up the IOPS that your single pool can provide. > It thrashes the disks. Your solution is to either buy more RAM, or find > something you can use as an L2ARC cache device for your pool. Ideally, > it would be an SSD. However, in this case, a plain hard drive would do > OK (NOT one already in a pool).To add such a device, you would do: > 'zpool add tank mycachedevice' > > That was an awesome response! Thank you for that :-) I tend to config my servers with 16G of ram minimum these days and now I know why. -- Dennis Clarke dcla...@opensolaris.ca <- Email related to the open source Solaris dcla...@blastwave.org <- Email related to open source for Solaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] size of slog device
> There is absolutely no sense in having slog devices larger than > then main memory, because it will never be used, right? > ZFS will rather flush the txg to disk than reading back from > zil? So there is a guideline to have enough slog to hold about 10 > seconds of zil, but the absolute maximum value is the size of > main memory. Is this correct? ZFS uses at most RAM/2 for ZIL Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] size of slog device
On Mon, Jun 14, 2010 at 4:41 AM, Arne Jansen wrote: > Hi, > > I known it's been discussed here more than once, and I read the > Evil tuning guide, but I didn't find a definitive statement: > > There is absolutely no sense in having slog devices larger than > then main memory, because it will never be used, right? > ZFS will rather flush the txg to disk than reading back from > zil? > So there is a guideline to have enough slog to hold about 10 > seconds of zil, but the absolute maximum value is the size of > main memory. Is this correct? > > I thought it was half the size of memory. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] size of slog device
Hi, I known it's been discussed here more than once, and I read the Evil tuning guide, but I didn't find a definitive statement: There is absolutely no sense in having slog devices larger than then main memory, because it will never be used, right? ZFS will rather flush the txg to disk than reading back from zil? So there is a guideline to have enough slog to hold about 10 seconds of zil, but the absolute maximum value is the size of main memory. Is this correct? Thanks, Arne ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What happens when unmirrored ZIL log devi ce is removed ungracefully
Hello I even have this problem on my system. I lost my backup server crashing the system-hd and the ZIL-device. After setting up a new system (osol 2009.06 and updating to the latest osol/dev version with zpool-dedup) I tried to import my backup pool, but I can't. The system tells me there isn't any zpool tank1 trying to replace / detache / attach / add any kind of device or answers this: zpool import -f tank1 cannot import 'tank1': one or more devices is currently unavailable Destroy and re-create the pool from a backup source. While using options -F, -X, -V, -C, -D and any combination of them the same reaction comes from the system. There are some solutions for problems in which the old cachefile is available or the ZIL-device isn't destroyed, but for my case there isn't anyone. I need a way importing the zpool by ignoring the ZIL-device. I spend a week searching in the net, but didn't found something. For some help I would be very glad. regards Ronny P.S. hoping you excuse my lousy English. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs/zpool] hang at boot
Just FYI. The error was that I created the ZFS at the wrong pool. rpool/a/b/c rpool/new I mounted "new" in a directory of rpoo/ "c". Seems like this hierarchical mounting is not working like I thought. ;) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshots, txgs and performance
Marcelo Leal wrote: > Hello there, > I think you should share it with the list, if you can, seems like an > interesting work. ZFS has some issues with snapshots and spa_sync performance > for snapshots deletion. I'm a bit reluctant to post it to the list where it can still be found years from now. Because the module is not compiled directly into ZFS but is a separate module that makes heavy use of internal structures of ZFS, it is designed for a specific version of ZFS (Solaris U8). It might still load without problems for years, but already in the next Solaris version it might wreak havoc because of a changed kernel structure. A much better way would be to have a similar operation integrated into the official source tree. I could try to build a patch if it has a chance of getting accepted. Until then, I have no problem with sharing it off-list. --Arne > > Thanks > > Leal > [ http://www.eall.com.br/blog ] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss