Re: [zfs-discuss] Persistent errors - do I believe?
Cheers, I did try that, but still got the same total on import - 2.73TB I even thought I might have just made a mistake with the numbers, so I made a sort of 'quarter scale model' in VMware and OSOL 2009.06, with 3x250G and 1x187G. That gave me a size of 744GB, which is *approx* 1/4 of what I get in the physical machine. That makes sense. I then replaced the 187 with another 250, still 744GB total, as expected. Exported & imported - now 996GB. So, the export and import process seems to be the thing to do, but why it's not working on my physical machine (SXCE119) is a mystery. I even contemplated that there might have still been a 750GB drive left in the setup, but they're all 1TB (well, 931.51GB). Any ideas what else it could be? For anyone interested in the checksum/permanent error thing, I'm running a scrub now. 59% done and not one error. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help! System panic when pool imported
On Fri, Sep 25, 2009 at 05:21:23AM +, Albert Chin wrote: > [[ snip snip ]] > > We really need to import this pool. Is there a way around this? We do > have snv_114 source on the system if we need to make changes to > usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs > destroy" transaction never completed and it is being replayed, causing > the panic. This cycle continues endlessly. What are the implications of adding the following to /etc/system: set zfs:zfs_recover=1 set aok=1 And importing the pool with: # zpool import -o ro -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Help! System panic when pool imported
Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a snapshot a few days ago: # zfs snapshot a...@b # zfs clone a...@b tank/a # zfs clone a...@b tank/b The system started panicing after I tried: # zfs snapshot tank/b...@backup So, I destroyed tank/b: # zfs destroy tank/b then tried to destroy tank/a # zfs destroy tank/a Now, the system is in an endless panic loop, unable to import the pool at system startup or with "zpool import". The panic dump is: panic[cpu1]/thread=ff0010246c60: assertion failed: 0 == zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 == 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512 ff00102468d0 genunix:assfail3+c1 () ff0010246a50 zfs:dsl_dataset_destroy_sync+85a () ff0010246aa0 zfs:dsl_sync_task_group_sync+eb () ff0010246b10 zfs:dsl_pool_sync+196 () ff0010246ba0 zfs:spa_sync+32a () ff0010246c40 zfs:txg_sync_thread+265 () ff0010246c50 unix:thread_start+8 () We really need to import this pool. Is there a way around this? We do have snv_114 source on the system if we need to make changes to usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs destroy" transaction never completed and it is being replayed, causing the panic. This cycle continues endlessly. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Persistent errors - do I believe?
Try exporting and reimporting the pool. That has done the trick for me in the past -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] periodic slow responsiveness
I thought I would try the same test using dd bs=131072 if=source of=/ path/to/nfs to see what the results looked liked… It is very similar to before, about 2x slog usage and same timing and write totals. Friday, 25 September 2009 1:49:48 PM EST extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/ w trn tot device 0.0 1538.70.0 196834.0 0.0 23.10.0 15.0 2 67 0 0 0 0 c7t2d0 0.0 562.00.0 71942.3 0.0 35.00.0 62.3 1 100 0 0 0 0 c7t2d0 0.0 590.70.0 75614.4 0.0 35.00.0 59.2 1 100 0 0 0 0 c7t2d0 0.0 600.90.0 76920.0 0.0 35.00.0 58.2 1 100 0 0 0 0 c7t2d0 0.0 546.00.0 69887.9 0.0 35.00.0 64.1 1 100 0 0 0 0 c7t2d0 0.0 554.00.0 70913.9 0.0 35.00.0 63.2 1 100 0 0 0 0 c7t2d0 0.0 598.00.0 76549.2 0.0 35.00.0 58.5 1 100 0 0 0 0 c7t2d0 0.0 563.00.0 72065.1 0.0 35.00.0 62.1 1 100 0 0 0 0 c7t2d0 0.0 588.10.0 75282.6 0.0 31.50.0 53.5 1 100 0 0 0 0 c7t2d0 0.0 564.00.0 72195.7 0.0 34.80.0 61.7 1 100 0 0 0 0 c7t2d0 0.0 582.80.0 74599.8 0.0 35.00.0 60.0 1 100 0 0 0 0 c7t2d0 0.0 544.00.0 69633.3 0.0 35.00.0 64.3 1 100 0 0 0 0 c7t2d0 0.0 530.00.0 67191.5 0.0 30.60.0 57.7 0 90 0 0 0 0 c7t2d0 And then the write to primary storage a few seconds later: Friday, 25 September 2009 1:50:14 PM EST extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 426.30.0 32196.3 0.0 12.70.0 29.8 1 45 0 0 0 0 c11t0d0 0.0 410.40.0 31857.1 0.0 12.40.0 30.3 1 45 0 0 0 0 c11t1d0 0.0 426.30.0 30698.1 0.0 13.00.0 30.5 1 45 0 0 0 0 c11t2d0 0.0 429.30.0 31392.3 0.0 12.60.0 29.4 1 45 0 0 0 0 c11t3d0 0.0 443.20.0 33280.8 0.0 12.90.0 29.1 1 45 0 0 0 0 c11t4d0 0.0 424.30.0 33872.4 0.0 12.70.0 30.0 1 45 0 0 0 0 c11t5d0 0.0 432.30.0 32903.2 0.0 12.60.0 29.2 1 45 0 0 0 0 c11t6d0 0.0 418.30.0 32562.0 0.0 12.50.0 29.9 1 45 0 0 0 0 c11t7d0 0.0 417.30.0 31746.2 0.0 12.40.0 29.8 1 44 0 0 0 0 c11t8d0 0.0 424.30.0 31270.6 0.0 12.70.0 29.9 1 45 0 0 0 0 c11t9d0 Friday, 25 September 2009 1:50:15 PM EST extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 434.90.0 37028.5 0.0 17.30.0 39.7 1 52 0 0 0 0 c11t0d0 1.0 436.9 64.3 37372.1 0.0 17.10.0 39.0 1 51 0 0 0 0 c11t1d0 1.0 442.9 64.3 38543.2 0.0 17.20.0 38.7 1 52 0 0 0 0 c11t2d0 1.0 436.9 64.3 37834.2 0.0 17.30.0 39.6 1 52 0 0 0 0 c11t3d0 1.0 412.8 64.3 35935.0 0.0 16.80.0 40.7 0 52 0 0 0 0 c11t4d0 1.0 413.8 64.3 35342.5 0.0 16.60.0 40.1 0 51 0 0 0 0 c11t5d0 2.0 418.8 128.6 36321.3 0.0 16.50.0 39.3 0 52 0 0 0 0 c11t6d0 1.0 425.8 64.3 36660.4 0.0 16.60.0 39.0 1 51 0 0 0 0 c11t7d0 1.0 437.9 64.3 37484.0 0.0 17.20.0 39.2 1 52 0 0 0 0 c11t8d0 0.0 437.90.0 37968.1 0.0 17.20.0 39.2 1 52 0 0 0 0 c11t9d0 So, 533MB source file, 13 seconds to write to the slog (14 before, no appreciable change), 1071.5MB written to the slog, 692.3MB written to primary storage. Just another data point. cheers, James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] periodic slow responsiveness
On 25/09/2009, at 11:49 AM, Bob Friesenhahn wrote: The commentary says that normally the COMMIT operations occur during close(2) or fsync(2) system call, or when encountering memory pressure. If the problem is slow copying of many small files, this COMMIT approach does not help very much since very little data is sent per file and most time is spent creating directories and files. The problem appears to be slog bandwidth exhaustion due to all data being sent via the slog creating a contention for all following NFS or locally synchronous writes. The NFS writes do not appear to be synchronous in nature - there is only a COMMIT being issued at the very end, however, all of that data appears to be going via the slog and it appears to be inflating to twice its original size. For a test, I just copied a relatively small file (8.4MB in size). Looking at a tcpdump analysis using wireshark, there is a SETATTR which ends with a V3 COMMIT and no COMMIT messages during the transfer. iostat output that matches looks like this: slog write of the data (17MB appears to hit the slog) Friday, 25 September 2009 1:01:00 PM EST extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 135.00.0 17154.5 0.0 0.80.06.0 0 3 0 0 0 0 c7t2d0 then a few seconds later, the transaction group gets flushed to primary storage writing nearly 11.4MB which is inline with raid Z2 (expect around 10.5MB; 8.4/8*10): Friday, 25 September 2009 1:01:13 PM EST extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 91.00.0 1170.4 0.0 0.10.01.3 0 2 0 0 0 0 c11t0d0 0.0 84.00.0 1171.4 0.0 0.10.01.2 0 2 0 0 0 0 c11t1d0 0.0 92.00.0 1172.4 0.0 0.10.01.2 0 2 0 0 0 0 c11t2d0 0.0 84.00.0 1172.4 0.0 0.10.01.3 0 2 0 0 0 0 c11t3d0 0.0 81.00.0 1176.4 0.0 0.10.01.4 0 2 0 0 0 0 c11t4d0 0.0 86.00.0 1176.4 0.0 0.10.01.4 0 2 0 0 0 0 c11t5d0 0.0 89.00.0 1175.4 0.0 0.10.01.4 0 2 0 0 0 0 c11t6d0 0.0 84.00.0 1175.4 0.0 0.10.01.3 0 2 0 0 0 0 c11t7d0 0.0 91.00.0 1168.9 0.0 0.10.01.3 0 2 0 0 0 0 c11t8d0 0.0 89.00.0 1170.9 0.0 0.10.01.4 0 2 0 0 0 0 c11t9d0 So I performed the same test with a much larger file (533MB) to see what it would do, being larger than the NVRAM cache in front of the SSD. Note that after the second second of activity the NVRAM is full and only allowing in about the sequential write speed of the SSD (~70MB/s). Friday, 25 September 2009 1:13:14 PM EST extended device statistics errors --- r/sw/s kr/skw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 640.90.0 81782.9 0.0 4.20.06.5 1 14 0 0 0 0 c7t2d0 0.0 1065.70.0 136408.1 0.0 18.60.0 17.5 1 78 0 0 0 0 c7t2d0 0.0 579.00.0 74113.3 0.0 30.70.0 53.1 1 100 0 0 0 0 c7t2d0 0.0 588.70.0 75357.0 0.0 33.20.0 56.3 1 100 0 0 0 0 c7t2d0 0.0 532.00.0 68096.3 0.0 31.50.0 59.1 1 100 0 0 0 0 c7t2d0 0.0 559.00.0 71428.0 0.0 32.50.0 58.1 1 100 0 0 0 0 c7t2d0 0.0 542.00.0 68755.9 0.0 25.10.0 46.4 1 100 0 0 0 0 c7t2d0 0.0 542.00.0 69376.4 0.0 35.00.0 64.6 1 100 0 0 0 0 c7t2d0 0.0 581.00.0 74368.0 0.0 30.60.0 52.6 1 100 0 0 0 0 c7t2d0 0.0 567.00.0 72574.1 0.0 33.20.0 58.6 1 100 0 0 0 0 c7t2d0 0.0 564.00.0 72194.1 0.0 31.10.0 55.2 1 100 0 0 0 0 c7t2d0 0.0 573.00.0 73343.5 0.0 33.20.0 57.9 1 100 0 0 0 0 c7t2d0 0.0 536.30.0 68640.5 0.0 33.10.0 61.7 1 100 0 0 0 0 c7t2d0 0.0 121.90.0 15608.9 0.0 2.70.0 22.1 0 22 0 0 0 0 c7t2d0 Again, the slog wrote about double the file size (1022.6MB) and a few seconds later, the data was pushed to the primary storage (684.9MB with an expectation of 666MB = 533MB/8*10) so again about the right number hit the spinning platters. Friday, 25 September 2009 1:13:43 PM EST extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 338.30.0 32794.4 0.0 13.70.0 40.6 1 47 0 0 0 0 c11t0d0 0.0 325.30.0 31399.8 0.0 13.70.0 42.0 1
Re: [zfs-discuss] periodic slow responsiveness
On Fri, 25 Sep 2009, James Lever wrote: NFS Version 3 introduces the concept of "safe asynchronous writes.? Being "safe" then requires a responsibilty level on the client which is often not present. For example, if the server crashes, and then the client crashes, how does the client resend the uncommitted data? If the client had a non-volatile storage cache, then it would be able to responsibly finish the writes that failed. The commentary says that normally the COMMIT operations occur during close(2) or fsync(2) system call, or when encountering memory pressure. If the problem is slow copying of many small files, this COMMIT approach does not help very much since very little data is sent per file and most time is spent creating directories and files. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs copy performance
Im measuring time by using the time command in the command arguments ie: time cp * / etc.. for copy i just used time cp pool/fs1/* /newpool/fs1 etc... for cpio i used, time find /pool/fs1 |cpio -pdmv /newpool/fs1 for zfs i ran a snapshot fist then, time zfs -R send snapshot| zfs receive -F -d newpool nothing else is running on the pool, Im testing it by just copying multiple 100MB tar files I also ran and zfs send -R snapshot > /sec/back, this command took the same time as the cp and cpio, its only when combining the receive that i get slow responce. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] periodic slow responsiveness
On 25/09/2009, at 1:24 AM, Bob Friesenhahn wrote: On Thu, 24 Sep 2009, James Lever wrote: Is there a way to tune this on the NFS server or clients such that when I perform a large synchronous write, the data does not go via the slog device? Synchronous writes are needed by NFS to support its atomic write requirement. It sounds like your SSD is write-bandwidth bottlenecked rather than IOPS bottlenecked. Replacing your SSD with a more performant one seems like the first step. NFS client tunings can make a big difference when it comes to performance. Check the nfs(5) manual page for your Linux systems to see what options are available. An obvious tunable is 'wsize' which should ideally match (or be a multiple of) the zfs filesystem block size. The /proc/mounts file for my Debian install shows that 1048576 is being used. This is quite large and perhaps a smaller value would help. If you are willing to accept the risk, using the Linux 'async' mount option may make things seem better. From the Linux NFS FAQ. http://nfs.sourceforge.net/ NFS Version 3 introduces the concept of "safe asynchronous writes.” And it continues. My rsize and wsize are negotiating to 1MB. James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] periodic slow responsiveness
On 25/09/2009, at 2:58 AM, Richard Elling wrote: On Sep 23, 2009, at 10:00 PM, James Lever wrote: So it turns out that the problem is that all writes coming via NFS are going through the slog. When that happens, the transfer speed to the device drops to ~70MB/s (the write speed of his SLC SSD) and until the load drops all new write requests are blocked causing a noticeable delay (which has been observed to be up to 20s, but generally only 2-4s). Thank you sir, can I have another? If you add (not attach) more slogs, the workload will be spread across them. But... My log configurations is : logs c7t2d0s0 ONLINE 0 0 0 c7t3d0s0 OFFLINE 0 0 0 I’m going to test the now removed SSD and see if I can get it to perform significantly worse than the first one, but my memory of testing these at pre-production testing was that they were both equally slow but not significantly different. On a related note, I had 2 of these devices (both using just 10GB partitions) connected as log devices (so the pool had 2 separate log devices) and the second one was consistently running significantly slower than the first. Removing the second device made an improvement on performance, but did not remove the occasional observed pauses. ...this is not surprising, when you add a slow slog device. This is the weakest link rule. So, in theory, even if one of the two SSDs was even slightly slower than the other, it would just appear that it would be more heavily effected? Here is part of what I’m not understanding - unless one SSD is significantly worse than the other, how can the following scenario be true? Here is some iostat output from the two slog devices at 1s intervals when it gets a large series of write requests. Idle at start. 0.0 1462.00.0 187010.2 0.0 28.60.0 19.6 2 83 0 0 0 0 c7t2d0 0.0 233.00.0 29823.7 0.0 28.70.0 123.3 0 83 0 0 0 0 c7t3d0 NVRAM cache close to full. (256MB BBC) 0.0 84.00.0 10622.0 0.0 3.50.0 41.2 0 12 0 0 0 0 c7t2d0 0.00.00.0 0.0 0.0 35.00.00.0 0 100 0 0 0 0 c7t3d0 0.00.00.0 0.0 0.0 0.00.00.0 0 0 0 0 0 0 c7t2d0 0.0 305.00.0 39039.3 0.0 35.00.0 114.7 0 100 0 0 0 0 c7t3d0 0.00.00.0 0.0 0.0 0.00.00.0 0 0 0 0 0 0 c7t2d0 0.0 361.00.0 46208.1 0.0 35.00.0 96.8 0 100 0 0 0 0 c7t3d0 0.00.00.0 0.0 0.0 0.00.00.0 0 0 0 0 0 0 c7t2d0 0.0 329.00.0 42114.0 0.0 35.00.0 106.3 0 100 0 0 0 0 c7t3d0 0.00.00.0 0.0 0.0 0.00.00.0 0 0 0 0 0 0 c7t2d0 0.0 317.00.0 40449.6 0.0 27.40.0 86.5 0 85 0 0 0 0 c7t3d0 0.04.00.0 263.8 0.0 0.00.00.2 0 0 0 0 0 0 c7t2d0 0.04.00.0 367.8 0.0 0.00.00.3 0 0 0 0 0 0 c7t3d0 What determines the size of the writes or distribution between slog devices? It looks like ZFS decided to send a large chunk to one slog which nearly filled the NVRAM, and then continue writing to the other one, which meant that it had to go at device speed (whatever that is for the data size/write size). Is there a way to tune the writes to multiple slogs to be (for arguments sake) 10MB slices? I was of the (mis)understanding that only metadata and writes smaller than 64k went via the slog device in the event of an O_SYNC write request? The threshold is 32 kBytes, which is unfortunately the same as the default NFS write size. See CR6686887 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6686887 If you have a slog and logbias=latency (default) then the writes go to the slog. So there is some interaction here that can affect NFS workloads in particular. Interesting CR. nfsstat -m output on one of the linux hosts (ubuntu) Flags: rw ,vers = 3 ,rsize = 1048576 ,wsize = 1048576 ,namlen = 255 ,hard ,nointr ,noacl ,proto = tcp ,timeo = 600 ,retrans =2,sec=sys,mountaddr=10.1.0.17,mountvers=3,mountproto=tcp,addr=10.1.0.17 rsize and wsize auto tuned to 1MB. How does this effect the sync request threshold? The clients are (mostly) RHEL5. Is there a way to tune this on the NFS server or clients such that when I perform a large synchronous write, the data does not go via the slog device? You can change the IOP size on the client. You’re suggesting modifying rsize/wsize? or something else? cheers, James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cloning Systems using zpool
On 09/24/09 15:54, Peter Pickford wrote: Hi Cindy, Wouldn't touch /reconfigure mv /etc/path_to_inst* /var/tmp/ regenerate all device information? It might, but it's hard to say whether that would accomplish everything needed to move a root file system from one system to another. I just got done modifying flash archive support to work with zfs root on Solaris 10 Update 8. For those not familiar with it, "flash archives" are a way to clone full boot environments across multiple machines. The S10 Solaris installer knows how to install one of these flash archives on a system and then do all the customizations to adapt it to the local hardware and local network environment. I'm pretty sure there's more to the customization than just a device reconfiguration. So feel free to hack together your own solution. It might work for you, but don't assume that you've come up with a completely general way to clone root pools. lori AFIK zfs doesn't care about the device names it scans for them it would only affect things like vfstab. I did a restore from a E2900 to V890 and is seemed to work Created the pool and zfs recieve. I would like to be able to have a zfs send of a minimal build and install it in an abe and activate it. I tried that is test and it seems to work. It seems to work but IM just wondering what I may have missed. I saw someone else has done this on the list and was going to write a blog. It seems like a good way to get a minimal install on a server with reduced downtime. Now if I just knew how to run the installer in and abe without there being an OS there already that would be cool too. Thanks Peter 2009/9/24 Cindy Swearingen : Hi Peter, I can't provide it because I don't know what it is. Even if we could provide a list of items, tweaking the device informaton if the systems are not identical would be too difficult. cs On 09/24/09 12:04, Peter Pickford wrote: Hi Cindy, Could you provide a list of system specific info stored in the root pool? Thanks Peter 2009/9/24 Cindy Swearingen : Hi Karl, Manually cloning the root pool is difficult. We have a root pool recovery procedure that you might be able to apply as long as the systems are identical. I would not attempt this with LiveUpgrade and manually tweaking. http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery The problem is that the amount system-specific info stored in the root pool and any kind of device differences might be insurmountable. Solaris 10 ZFS/flash archive support is available with patches but not for the Nevada release. The ZFS team is working on a split-mirrored-pool feature and that might be an option for future root pool cloning. If you're still interested in a manual process, see the steps below attempted by another community member who moved his root pool to a larger disk on the same system. This is probably more than you wanted to know... Cindy # zpool create -f altrpool c1t1d0s0 # zpool set listsnapshots=on rpool # SNAPNAME=`date +%Y%m%d` # zfs snapshot -r rpool/r...@$snapname # zfs list -t snapshot # zfs send -R rp...@$snapname | zfs recv -vFd altrpool # installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c1t1d0s0 for x86 do # installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0 Set the bootfs property on the root pool BE. # zpool set bootfs=altrpool/ROOT/zfsBE altrpool # zpool export altrpool # init 5 remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0 -insert solaris10 dvd ok boot cdrom -s # zpool import altrpool rpool # init 0 ok boot disk1 On 09/24/09 10:06, Karl Rossing wrote: I would like to clone the configuration on a v210 with snv_115. The current pool looks like this: -bash-3.2$ /usr/sbin/zpool statuspool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to /tmp/a so that I can make the changes I need prior to removing the drive and putting it into the new v210. I supose I could lucreate -n new_v210, lumount new_v210, edit what I need to, luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0 and then luactivate the original boot environment. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensola
Re: [zfs-discuss] Cloning Systems using zpool
Hi Cindy, Wouldn't touch /reconfigure mv /etc/path_to_inst* /var/tmp/ regenerate all device information? AFIK zfs doesn't care about the device names it scans for them it would only affect things like vfstab. I did a restore from a E2900 to V890 and is seemed to work Created the pool and zfs recieve. I would like to be able to have a zfs send of a minimal build and install it in an abe and activate it. I tried that is test and it seems to work. It seems to work but IM just wondering what I may have missed. I saw someone else has done this on the list and was going to write a blog. It seems like a good way to get a minimal install on a server with reduced downtime. Now if I just knew how to run the installer in and abe without there being an OS there already that would be cool too. Thanks Peter 2009/9/24 Cindy Swearingen : > Hi Peter, > > I can't provide it because I don't know what it is. > > Even if we could provide a list of items, tweaking > the device informaton if the systems are not identical > would be too difficult. > > cs > > On 09/24/09 12:04, Peter Pickford wrote: >> >> Hi Cindy, >> >> Could you provide a list of system specific info stored in the root pool? >> >> Thanks >> >> Peter >> >> 2009/9/24 Cindy Swearingen : >>> >>> Hi Karl, >>> >>> Manually cloning the root pool is difficult. We have a root pool recovery >>> procedure that you might be able to apply as long as the >>> systems are identical. I would not attempt this with LiveUpgrade >>> and manually tweaking. >>> >>> >>> http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery >>> >>> The problem is that the amount system-specific info stored in the root >>> pool and any kind of device differences might be insurmountable. >>> >>> Solaris 10 ZFS/flash archive support is available with patches but not >>> for the Nevada release. >>> >>> The ZFS team is working on a split-mirrored-pool feature and that might >>> be an option for future root pool cloning. >>> >>> If you're still interested in a manual process, see the steps below >>> attempted by another community member who moved his root pool to a >>> larger disk on the same system. >>> >>> This is probably more than you wanted to know... >>> >>> Cindy >>> >>> >>> >>> # zpool create -f altrpool c1t1d0s0 >>> # zpool set listsnapshots=on rpool >>> # SNAPNAME=`date +%Y%m%d` >>> # zfs snapshot -r rpool/r...@$snapname >>> # zfs list -t snapshot >>> # zfs send -R rp...@$snapname | zfs recv -vFd altrpool >>> # installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk >>> /dev/rdsk/c1t1d0s0 >>> for x86 do >>> # installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0 >>> Set the bootfs property on the root pool BE. >>> # zpool set bootfs=altrpool/ROOT/zfsBE altrpool >>> # zpool export altrpool >>> # init 5 >>> remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0 >>> -insert solaris10 dvd >>> ok boot cdrom -s >>> # zpool import altrpool rpool >>> # init 0 >>> ok boot disk1 >>> >>> On 09/24/09 10:06, Karl Rossing wrote: I would like to clone the configuration on a v210 with snv_115. The current pool looks like this: -bash-3.2$ /usr/sbin/zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to /tmp/a so that I can make the changes I need prior to removing the drive and putting it into the new v210. I supose I could lucreate -n new_v210, lumount new_v210, edit what I need to, luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0 and then luactivate the original boot environment. >>> >>> ___ >>> zfs-discuss mailing list >>> zfs-discuss@opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>> > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20
Oracle use Linux :-( But on the positive note have a look at this:- http://www.youtube.com/watch?v=rmrxN3GWHpM It's Ed Zander talking to Larry and asking some great questions. 29:45 Ed asks what parts of Sun are you going to keep - all of it! 45:00 Larry's rant on Cloud Computing "the cloud is water vapour!" 20:00 Talks about Russell Coutts (a good kiwi bloke) and the America's cup if you don't care about anything else. Although they seem confused about who should own it, Team New Zealand are only letting the Swiss borrow it for a while until they loose all our top sailors, like Russell and we win it back, once the trimaran side show is over :-) Oh and back on topic. Anybody found any info on the F20. I've a customer who wants to buy one and on the partner portal I can't find any real details (Just the Facts, or SunIntro, onestop for partner page would be nice) Trevor Enda O'Connor wrote: Richard Elling wrote: On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote: I'm surprised no-one else has posted about this - part of the Sun Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor for cache protection. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml At the Exadata-2 announcement, Larry kept saying that it wasn't a disk. But there was little else of a technical nature said, though John did have one to show. RAC doesn't work with ZFS directly, so the details of the configuration should prove interesting. isn't exadata based on linux, so not clear where zfs comes into play, but I didn't see any of this oracle preso, so could be confused by all this. Enda -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss www.eagle.co.nz This email is confidential and may be legally privileged. If received in error please destroy and immediately notify us. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cloning Systems using zpool
Karl, I'm not sure I'm following everything. If you can't swap the drives, the which pool would you import? If you install the new v210 with snv_115, then you would have a bootable root pool. You could then receive the snapshots from the old root pool into the root pool on the new v210. I would practice the snapshot/send/recv'ing process if you are not familiar with it before you attempt the migration. Cindy On 09/24/09 12:39, Karl Rossing wrote: Thanks for the help. Since the v210's in question are at a remote site. It might be a bit of a pain getting the drives swapped by end users. So I thought of something else. Could I netboot the new v210 with snv_115, use zfs send/receive with ssh to grab the data on the old server, install the boot block, import the pool, make the changes I need and reboot the system? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20
Roland Rambau wrote: Richard, Tim, yes, one might envision the X4275 as OpenStorage appliances, but they are not. Exadata 2 is - *all* Sun hardware - *all* Oracle software (*) and that combination is now an Oracle product: a database appliance. Is there any reason the X4275 couldn't be an OpenStorage appliance? It seems like it would be a good fit. It doesn't seem specific to Exadata2. The F20 accelerator card isn't something specific to Exadata2 either is it? It looks like something that would benefit any kind of storage server. When I saw the F20 on the Sun site the other day, my first thought was "Oh cool, they reinvented Prestoserve!" -Brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cloning Systems using zpool
Thanks for the help. Since the v210's in question are at a remote site. It might be a bit of a pain getting the drives swapped by end users. So I thought of something else. Could I netboot the new v210 with snv_115, use zfs send/receive with ssh to grab the data on the old server, install the boot block, import the pool, make the changes I need and reboot the system? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cloning Systems using zpool
Hi Peter, I can't provide it because I don't know what it is. Even if we could provide a list of items, tweaking the device informaton if the systems are not identical would be too difficult. cs On 09/24/09 12:04, Peter Pickford wrote: Hi Cindy, Could you provide a list of system specific info stored in the root pool? Thanks Peter 2009/9/24 Cindy Swearingen : Hi Karl, Manually cloning the root pool is difficult. We have a root pool recovery procedure that you might be able to apply as long as the systems are identical. I would not attempt this with LiveUpgrade and manually tweaking. http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery The problem is that the amount system-specific info stored in the root pool and any kind of device differences might be insurmountable. Solaris 10 ZFS/flash archive support is available with patches but not for the Nevada release. The ZFS team is working on a split-mirrored-pool feature and that might be an option for future root pool cloning. If you're still interested in a manual process, see the steps below attempted by another community member who moved his root pool to a larger disk on the same system. This is probably more than you wanted to know... Cindy # zpool create -f altrpool c1t1d0s0 # zpool set listsnapshots=on rpool # SNAPNAME=`date +%Y%m%d` # zfs snapshot -r rpool/r...@$snapname # zfs list -t snapshot # zfs send -R rp...@$snapname | zfs recv -vFd altrpool # installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c1t1d0s0 for x86 do # installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0 Set the bootfs property on the root pool BE. # zpool set bootfs=altrpool/ROOT/zfsBE altrpool # zpool export altrpool # init 5 remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0 -insert solaris10 dvd ok boot cdrom -s # zpool import altrpool rpool # init 0 ok boot disk1 On 09/24/09 10:06, Karl Rossing wrote: I would like to clone the configuration on a v210 with snv_115. The current pool looks like this: -bash-3.2$ /usr/sbin/zpool statuspool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to /tmp/a so that I can make the changes I need prior to removing the drive and putting it into the new v210. I supose I could lucreate -n new_v210, lumount new_v210, edit what I need to, luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0 and then luactivate the original boot environment. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20
Richard Elling wrote: On Sep 24, 2009, at 10:17 AM, Tim Cook wrote: On Thu, Sep 24, 2009 at 12:10 PM, Richard Elling wrote: On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote: I'm surprised no-one else has posted about this - part of the Sun Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor for cache protection. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml At the Exadata-2 announcement, Larry kept saying that it wasn't a disk. But there was little else of a technical nature said, though John did have one to show. RAC doesn't work with ZFS directly, so the details of the configuration should prove interesting. -- richard Exadata 2 is built on Linux from what I read, so I'm not entirely sure how it would leverage ZFS, period. I hope I heard wrong or the whole announcement feels like a bit of a joke to me. It is not clear to me. They speak of "storage servers" which would be needed to implement the shared storage. These are described as Sun Fire X4275 loaded with the FlashFire cards. I am not aware of a production-ready Linux file system which implements a hybrid storage pool. I could easily envision these as being OpenStorage appliances. -- richard Well, I'm not an expert on this at all, but what was said IIRC is that it is using ASM with the whole lot running on OEL. These aren't just plain storage servers either. The storage servers are provided with enough details of the DB search being performed to do an initial filtering of the data so the data returned to the DB servers for them to work on is only typically 10% of the raw data they would conventionally have to process (and that's before taking compression into account). I haven't seen anything which says exactly how the flash cache is used (as in, is it ASM or the database which decides what goes in flash?). ASM certainly has the smarts to do this level of tuning for conventional disk layout, and just like ZFS, it puts hot data on the outer edge of a disk and uses slower parts of disks for less performant data (things like backups), so it certainly could decide what goes into flash. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cloning Systems using zpool
As Cindy said, this isn't trivial right now. Personally, I'd do it this way: ASSUMPTIONS: * both v210 machines are reasonably identical (may differ in RAM or CPU speed, but nothing much else). * Call the original machine A and the new machine B * machine B has no current drives in it. METHOD: 1) In A, Install the boot block on c1t1 as Cindy detailed below (installboot . ) 2) shutdown A 3) remove c1t0 from A (that is, the original boot drive) 4) boot A from c1t1 (you will likely have to do this at the boot prom, via something like 'boot disk2' ) 5) once A is back up, make the changes you need to make A look like what B should be. Note that ZFS will mark c1t0 as Failed. 6) shutdown A, remove c1t1, and move it to B, putting it in the c1t1 disk slot (i.e. the 2nd slot) 7) boot B, in the same manner you did A a minute ago (boot disk2) 8) when B is up, insert a new drive into the c1t0 slot, and do a 'zpool replace rpool c1t0d0 c1t0d0' 9) after the resilver completes, do an 'installboot' on c1t0 10) reboot B, and everything should be set. 11) on A, re-insert the original c1t0 into it's standard place (i.e. it should remain c1t0) 12) boot A 13) insert a fresh drive into the c1t1 slot 14) zpool replace rpool c1t1d0 c1t1d0 15) installboot after resilver Note that I've not specifically tried the above, but I can't see any reason why it shouldn't work. -Erik Cindy Swearingen wrote: Hi Karl, Manually cloning the root pool is difficult. We have a root pool recovery procedure that you might be able to apply as long as the systems are identical. I would not attempt this with LiveUpgrade and manually tweaking. http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery The problem is that the amount system-specific info stored in the root pool and any kind of device differences might be insurmountable. Solaris 10 ZFS/flash archive support is available with patches but not for the Nevada release. The ZFS team is working on a split-mirrored-pool feature and that might be an option for future root pool cloning. If you're still interested in a manual process, see the steps below attempted by another community member who moved his root pool to a larger disk on the same system. This is probably more than you wanted to know... Cindy # zpool create -f altrpool c1t1d0s0 # zpool set listsnapshots=on rpool # SNAPNAME=`date +%Y%m%d` # zfs snapshot -r rpool/r...@$snapname # zfs list -t snapshot # zfs send -R rp...@$snapname | zfs recv -vFd altrpool # installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c1t1d0s0 for x86 do # installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0 Set the bootfs property on the root pool BE. # zpool set bootfs=altrpool/ROOT/zfsBE altrpool # zpool export altrpool # init 5 remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0 -insert solaris10 dvd ok boot cdrom -s # zpool import altrpool rpool # init 0 ok boot disk1 On 09/24/09 10:06, Karl Rossing wrote: I would like to clone the configuration on a v210 with snv_115. The current pool looks like this: -bash-3.2$ /usr/sbin/zpool statuspool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to /tmp/a so that I can make the changes I need prior to removing the drive and putting it into the new v210. I supose I could lucreate -n new_v210, lumount new_v210, edit what I need to, luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0 and then luactivate the original boot environment. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20
Richard, Tim, yes, one might envision the X4275 as OpenStorage appliances, but they are not. Exadata 2 is - *all* Sun hardware - *all* Oracle software (*) and that combination is now an Oracle product: a database appliance. All nodes run Oracles Linux; as far as I understand - and that is not sooo much - Oracle has offloaded certain database functionality into the storage nodes. I would not assume that there is a hybrid storage pool with a file system - it is a distributed data base that knows to utilize flash storage. I see it as a first quick step. hth -- Roland PS: (*) disregarding firmware-like software components like Service Processor code or IB subnet managers in the IB switches, which are provided by Sun Richard Elling schrieb: On Sep 24, 2009, at 10:17 AM, Tim Cook wrote: On Thu, Sep 24, 2009 at 12:10 PM, Richard Elling wrote: On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote: I'm surprised no-one else has posted about this - part of the Sun Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor for cache protection. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml At the Exadata-2 announcement, Larry kept saying that it wasn't a disk. But there was little else of a technical nature said, though John did have one to show. RAC doesn't work with ZFS directly, so the details of the configuration should prove interesting. -- richard Exadata 2 is built on Linux from what I read, so I'm not entirely sure how it would leverage ZFS, period. I hope I heard wrong or the whole announcement feels like a bit of a joke to me. It is not clear to me. They speak of "storage servers" which would be needed to implement the shared storage. These are described as Sun Fire X4275 loaded with the FlashFire cards. I am not aware of a production-ready Linux file system which implements a hybrid storage pool. I could easily envision these as being OpenStorage appliances. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- ** Roland Rambau Platform Technology Team Principal Field Technologist Global Systems Engineering Phone: +49-89-46008-2520 Mobile:+49-172-84 58 129 Fax: +49-89-46008- mailto:roland.ram...@sun.com ** Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten Amtsgericht München: HRB 161028; Geschäftsführer: Thomas Schröder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Häring *** UNIX * /bin/sh FORTRAN ** ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cloning Systems using zpool
Hi Cindy, Could you provide a list of system specific info stored in the root pool? Thanks Peter 2009/9/24 Cindy Swearingen : > Hi Karl, > > Manually cloning the root pool is difficult. We have a root pool recovery > procedure that you might be able to apply as long as the > systems are identical. I would not attempt this with LiveUpgrade > and manually tweaking. > > http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery > > The problem is that the amount system-specific info stored in the root > pool and any kind of device differences might be insurmountable. > > Solaris 10 ZFS/flash archive support is available with patches but not > for the Nevada release. > > The ZFS team is working on a split-mirrored-pool feature and that might > be an option for future root pool cloning. > > If you're still interested in a manual process, see the steps below > attempted by another community member who moved his root pool to a > larger disk on the same system. > > This is probably more than you wanted to know... > > Cindy > > > > # zpool create -f altrpool c1t1d0s0 > # zpool set listsnapshots=on rpool > # SNAPNAME=`date +%Y%m%d` > # zfs snapshot -r rpool/r...@$snapname > # zfs list -t snapshot > # zfs send -R rp...@$snapname | zfs recv -vFd altrpool > # installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk > /dev/rdsk/c1t1d0s0 > for x86 do > # installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0 > Set the bootfs property on the root pool BE. > # zpool set bootfs=altrpool/ROOT/zfsBE altrpool > # zpool export altrpool > # init 5 > remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0 > -insert solaris10 dvd > ok boot cdrom -s > # zpool import altrpool rpool > # init 0 > ok boot disk1 > > On 09/24/09 10:06, Karl Rossing wrote: >> >> I would like to clone the configuration on a v210 with snv_115. >> >> The current pool looks like this: >> >> -bash-3.2$ /usr/sbin/zpool status pool: rpool >> state: ONLINE >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> rpool ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> c1t0d0s0 ONLINE 0 0 0 >> c1t1d0s0 ONLINE 0 0 0 >> >> errors: No known data errors >> >> After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to >> /tmp/a so that I can make the changes I need prior to removing the drive and >> putting it into the new v210. >> >> I supose I could lucreate -n new_v210, lumount new_v210, edit what I need >> to, luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0 and >> then luactivate the original boot environment. > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ARC vs Oracle cache
Richard Elling wrote: On Sep 24, 2009, at 10:30 AM, Javier Conde wrote: Hello, Given the following configuration: * Server with 12 SPARCVII CPUs and 96 GB of RAM * ZFS used as file system for Oracle data * Oracle 10.2.0.4 with 1.7TB of data and indexes * 1800 concurrents users with PeopleSoft Financial * 2 PeopleSoft transactions per day * HDS USP1100 with LUNs stripped on 6 parity groups (450xRAID7+1), total 48 disks * 2x 4Gbps FC with MPxIO Which is the best Oracle SGA size to avoid cache duplication between Oracle and ZFS? Is it better to have a "small SGA + big ZFS ARC" or "large SGA + small ZFS ARC"? Who does a better cache for overall performance? In general, it is better to cache closer to the consumer (application). You don't mention what version of Solaris or ZFS you are using. For later versions, the primarycache property allows you to control the ARC usage on a per-dataset basis. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Hi addign oracle-interest I would suggest some testing but standard recommendation to start with are keep zfs record size is db block size, keep oracle log writer to it's own pool ( 128k recordsize is recommended I believe for this one ), the log writer is a io limiting factor as such , use latest Ku's for solaris as they contain some critical fixes for zfs/oracle, ie 6775697 for instance. Small SGA is not usually recommended, but of course a lot depends on application layer as well, I can only say test with the recommendations above and then deviate from there, perhaps keeping zil on separate high latency device might help ( again only analysis can determine all that ). Then remember that even after that with a large SGA etc, sometimes perf can degrade, ie might need to instruct oracle to actually cache, via alter table cache command etc. getting familiar with statspack aws will be a must here :-) as only an analysis of Oracle from an oracle point of view can really tell what is workign as such. Enda ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ARC vs Oracle cache
Hi Richard, Thanks for your reply. We are using Solaris 10 u6 and ZFS version 10. Regards, Javi Richard Elling wrote: On Sep 24, 2009, at 10:30 AM, Javier Conde wrote: Hello, Given the following configuration: * Server with 12 SPARCVII CPUs and 96 GB of RAM * ZFS used as file system for Oracle data * Oracle 10.2.0.4 with 1.7TB of data and indexes * 1800 concurrents users with PeopleSoft Financial * 2 PeopleSoft transactions per day * HDS USP1100 with LUNs stripped on 6 parity groups (450xRAID7+1), total 48 disks * 2x 4Gbps FC with MPxIO Which is the best Oracle SGA size to avoid cache duplication between Oracle and ZFS? Is it better to have a "small SGA + big ZFS ARC" or "large SGA + small ZFS ARC"? Who does a better cache for overall performance? In general, it is better to cache closer to the consumer (application). You don't mention what version of Solaris or ZFS you are using. For later versions, the primarycache property allows you to control the ARC usage on a per-dataset basis. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cloning Systems using zpool
Hi Karl, Manually cloning the root pool is difficult. We have a root pool recovery procedure that you might be able to apply as long as the systems are identical. I would not attempt this with LiveUpgrade and manually tweaking. http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery The problem is that the amount system-specific info stored in the root pool and any kind of device differences might be insurmountable. Solaris 10 ZFS/flash archive support is available with patches but not for the Nevada release. The ZFS team is working on a split-mirrored-pool feature and that might be an option for future root pool cloning. If you're still interested in a manual process, see the steps below attempted by another community member who moved his root pool to a larger disk on the same system. This is probably more than you wanted to know... Cindy # zpool create -f altrpool c1t1d0s0 # zpool set listsnapshots=on rpool # SNAPNAME=`date +%Y%m%d` # zfs snapshot -r rpool/r...@$snapname # zfs list -t snapshot # zfs send -R rp...@$snapname | zfs recv -vFd altrpool # installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c1t1d0s0 for x86 do # installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0 Set the bootfs property on the root pool BE. # zpool set bootfs=altrpool/ROOT/zfsBE altrpool # zpool export altrpool # init 5 remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0 -insert solaris10 dvd ok boot cdrom -s # zpool import altrpool rpool # init 0 ok boot disk1 On 09/24/09 10:06, Karl Rossing wrote: I would like to clone the configuration on a v210 with snv_115. The current pool looks like this: -bash-3.2$ /usr/sbin/zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to /tmp/a so that I can make the changes I need prior to removing the drive and putting it into the new v210. I supose I could lucreate -n new_v210, lumount new_v210, edit what I need to, luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0 and then luactivate the original boot environment. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving files from one fs to another, splittin/merging
> Thanks for the info. Glad to hear it's in the works, too. It is not in the works. If you look at the bug IDs in the bug database you will find no indication of work done on them. > > Paul > > > 1:21pm, Mark J Musante wrote: > >> On Thu, 24 Sep 2009, Paul Archer wrote: >> >>> I may have missed something in the docs, but if I have a file in one FS, >>> and want to move it to another FS (assuming both filesystems are on the same >>> ZFS pool), is there a way to do it outside of the standard mv/cp/rsync >>> commands? >> >> Not yet. CR 6483179 covers this. >> >>> On a related(?) note, is there a way to split an existing filesystem? >> >> Not yet. CR 6400399 covers this. >> >> >> Regards, >> markm >> > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20
On Sep 24, 2009, at 10:17 AM, Tim Cook wrote: On Thu, Sep 24, 2009 at 12:10 PM, Richard Elling > wrote: On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote: I'm surprised no-one else has posted about this - part of the Sun Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor for cache protection. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml At the Exadata-2 announcement, Larry kept saying that it wasn't a disk. But there was little else of a technical nature said, though John did have one to show. RAC doesn't work with ZFS directly, so the details of the configuration should prove interesting. -- richard Exadata 2 is built on Linux from what I read, so I'm not entirely sure how it would leverage ZFS, period. I hope I heard wrong or the whole announcement feels like a bit of a joke to me. It is not clear to me. They speak of "storage servers" which would be needed to implement the shared storage. These are described as Sun Fire X4275 loaded with the FlashFire cards. I am not aware of a production-ready Linux file system which implements a hybrid storage pool. I could easily envision these as being OpenStorage appliances. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ARC vs Oracle cache
On Sep 24, 2009, at 10:30 AM, Javier Conde wrote: Hello, Given the following configuration: * Server with 12 SPARCVII CPUs and 96 GB of RAM * ZFS used as file system for Oracle data * Oracle 10.2.0.4 with 1.7TB of data and indexes * 1800 concurrents users with PeopleSoft Financial * 2 PeopleSoft transactions per day * HDS USP1100 with LUNs stripped on 6 parity groups (450xRAID7+1), total 48 disks * 2x 4Gbps FC with MPxIO Which is the best Oracle SGA size to avoid cache duplication between Oracle and ZFS? Is it better to have a "small SGA + big ZFS ARC" or "large SGA + small ZFS ARC"? Who does a better cache for overall performance? In general, it is better to cache closer to the consumer (application). You don't mention what version of Solaris or ZFS you are using. For later versions, the primarycache property allows you to control the ARC usage on a per-dataset basis. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving files from one fs to another, splittin/merging
Thanks for the info. Glad to hear it's in the works, too. Paul 1:21pm, Mark J Musante wrote: On Thu, 24 Sep 2009, Paul Archer wrote: I may have missed something in the docs, but if I have a file in one FS, and want to move it to another FS (assuming both filesystems are on the same ZFS pool), is there a way to do it outside of the standard mv/cp/rsync commands? Not yet. CR 6483179 covers this. On a related(?) note, is there a way to split an existing filesystem? Not yet. CR 6400399 covers this. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS ARC vs Oracle cache
Hello, Given the following configuration: * Server with 12 SPARCVII CPUs and 96 GB of RAM * ZFS used as file system for Oracle data * Oracle 10.2.0.4 with 1.7TB of data and indexes * 1800 concurrents users with PeopleSoft Financial * 2 PeopleSoft transactions per day * HDS USP1100 with LUNs stripped on 6 parity groups (450xRAID7+1), total 48 disks * 2x 4Gbps FC with MPxIO Which is the best Oracle SGA size to avoid cache duplication between Oracle and ZFS? Is it better to have a "small SGA + big ZFS ARC" or "large SGA + small ZFS ARC"? Who does a better cache for overall performance? Thanks in advance and best regards, Javi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving files from one fs to another, splittin/merging
On Thu, 24 Sep 2009, Paul Archer wrote: I may have missed something in the docs, but if I have a file in one FS, and want to move it to another FS (assuming both filesystems are on the same ZFS pool), is there a way to do it outside of the standard mv/cp/rsync commands? Not yet. CR 6483179 covers this. On a related(?) note, is there a way to split an existing filesystem? Not yet. CR 6400399 covers this. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20
Richard Elling wrote: On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote: I'm surprised no-one else has posted about this - part of the Sun Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor for cache protection. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml At the Exadata-2 announcement, Larry kept saying that it wasn't a disk. But there was little else of a technical nature said, though John did have one to show. RAC doesn't work with ZFS directly, so the details of the configuration should prove interesting. isn't exadata based on linux, so not clear where zfs comes into play, but I didn't see any of this oracle preso, so could be confused by all this. Enda -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20
On Thu, Sep 24, 2009 at 12:10 PM, Richard Elling wrote: > On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote: > > I'm surprised no-one else has posted about this - part of the Sun Oracle >> Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of >> SLC, a built-in SAS controller and a super-capacitor for cache protection. >> http://www.sun.com/storage/disk_systems/sss/f20/specs.xml >> > > At the Exadata-2 announcement, Larry kept saying that it wasn't a disk. > But there > was little else of a technical nature said, though John did have one to > show. > > RAC doesn't work with ZFS directly, so the details of the configuration > should prove > interesting. > -- richard > Exadata 2 is built on Linux from what I read, so I'm not entirely sure how it would leverage ZFS, period. I hope I heard wrong or the whole announcement feels like a bit of a joke to me. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20
On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote: I'm surprised no-one else has posted about this - part of the Sun Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor for cache protection. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml At the Exadata-2 announcement, Larry kept saying that it wasn't a disk. But there was little else of a technical nature said, though John did have one to show. RAC doesn't work with ZFS directly, so the details of the configuration should prove interesting. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] periodic slow responsiveness
comment below... On Sep 23, 2009, at 10:00 PM, James Lever wrote: On 08/09/2009, at 2:01 AM, Ross Walker wrote: On Sep 7, 2009, at 1:32 AM, James Lever wrote: Well a MD1000 holds 15 drives a good compromise might be 2 7 drive RAIDZ2s with a hotspare... That should provide 320 IOPS instead of 160, big difference. The issue is interactive responsiveness and if there is a way to tune the system to give that while still having good performance for builds when they are run. Look at the write IOPS of the pool with the zpool iostat -v and look at how many are happening on the RAIDZ2 vdev. I was suggesting that slog write were possibly starving reads from the l2arc as they were on the same device. This appears not to have been the issue as the problem has persisted even with the l2arc devices removed from the pool. The SSD will handle a lot more IOPS then the pool and L2ARC is a lazy reader, it mostly just holds on to read cache data. It just may be that the pool configuration just can't handle the write IOPS needed and reads are starving. Possible, but hard to tell. Have a look at the iostat results I’ve posted. The busy times of the disks while the issue is occurring should let you know. So it turns out that the problem is that all writes coming via NFS are going through the slog. When that happens, the transfer speed to the device drops to ~70MB/s (the write speed of his SLC SSD) and until the load drops all new write requests are blocked causing a noticeable delay (which has been observed to be up to 20s, but generally only 2-4s). Thank you sir, can I have another? If you add (not attach) more slogs, the workload will be spread across them. But... I can reproduce this behaviour by copying a large file (hundreds of MB in size) using 'cp src dst’ on an NFS (still currently v3) client and observe that all data is pushed through the slog device (10GB partition of a Samsung 50GB SSD behind a PERC 6/i w/256MB BBC) rather than going direct to the primary storage disks. On a related note, I had 2 of these devices (both using just 10GB partitions) connected as log devices (so the pool had 2 separate log devices) and the second one was consistently running significantly slower than the first. Removing the second device made an improvement on performance, but did not remove the occasional observed pauses. ...this is not surprising, when you add a slow slog device. This is the weakest link rule. I was of the (mis)understanding that only metadata and writes smaller than 64k went via the slog device in the event of an O_SYNC write request? The threshold is 32 kBytes, which is unfortunately the same as the default NFS write size. See CR6686887 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6686887 If you have a slog and logbias=latency (default) then the writes go to the slog. So there is some interaction here that can affect NFS workloads in particular. The clients are (mostly) RHEL5. Is there a way to tune this on the NFS server or clients such that when I perform a large synchronous write, the data does not go via the slog device? You can change the IOP size on the client. -- richard I have investigated using the logbias setting, but that will just kill small file performance also on any filesystem using it and defeat the purpose of having a slog device at all. cheers, James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] You're Invited: OpenSolaris Security Summit
On Sep 24, 2009, at 2:19 AM, Darren J Moffat wrote: Jennifer Bauer Scarpino wrote: To: Developers and Students You are invited to participate in the first OpenSolaris Security Summit OpenSolaris Security Summit Tuesday, November 3rd, 2009 Baltimore Marriott Waterfront 700 Aliceanna Street Baltimore, Maryland 21202 I will be giving a talk and live demo of ZFS Crypto at this event. Other, related tutorials at the same conference are: + Sunday: Jim Mauro's Solaris Dynamic Tracing (DTrace): Finding the Light Where There Was Only Darkness + Monday: Jim Mauro's Solaris 10 Performance, Observability, and Debugging + Monday: Richard Elling's ZFS: A Filesystem for Modern Hardware + Tuesday: Peter Galvin & Marc Stavely have two half-day tutorials (conflict with the summit) + Solaris 10 Administration Workshop 1: Administration (Hands-on) + Solaris 10 Administration Workshop 2: Virtualization (Hands-on) + Wednesday: Peter Galvin & Marc Stavely have two more half-day tutorials + Solaris 10 Administration Workshop 3: File Systems (Hands-on) + Solaris 10 Administration Workshop 4: Security (Hands-on) + Thursday: Jeff Victor's Resource Management with Solaris Containers And, of course, there are always lots of good technical papers at LISA. http://www.usenix.org/events/lisa09 -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Cloning Systems using zpool
I would like to clone the configuration on a v210 with snv_115. The current pool looks like this: -bash-3.2$ /usr/sbin/zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to /tmp/a so that I can make the changes I need prior to removing the drive and putting it into the new v210. I supose I could lucreate -n new_v210, lumount new_v210, edit what I need to, luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0 and then luactivate the original boot environment. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] moving files from one fs to another, splittin/merging
I may have missed something in the docs, but if I have a file in one FS, and want to move it to another FS (assuming both filesystems are on the same ZFS pool), is there a way to do it outside of the standard mv/cp/rsync commands? For example, I have a pool with my home directory as a FS, and I have another FS with ISOs. I download an ISO of an OpenSolaris DVD (say, 3GB), but it goes into my home directory. Since ZFS is all about pools and shared storage, it seems like it would be natural to move the file vi a 'zfs' command, rather mv/cp/etc... On a related(?) note, is there a way to split an existing filesystem? To use the example above, let's say I have an ISO directory in my home directory, but it's getting big, plus I'd like to share it out on my network. Is there a way to split my home directory's FS, so that the ISO directory becomes its own FS? Paul Archer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] periodic slow responsiveness
On Thu, 24 Sep 2009, James Lever wrote: I was of the (mis)understanding that only metadata and writes smaller than 64k went via the slog device in the event of an O_SYNC write request? What would cause you to understand that? Is there a way to tune this on the NFS server or clients such that when I perform a large synchronous write, the data does not go via the slog device? Synchronous writes are needed by NFS to support its atomic write requirement. It sounds like your SSD is write-bandwidth bottlenecked rather than IOPS bottlenecked. Replacing your SSD with a more performant one seems like the first step. NFS client tunings can make a big difference when it comes to performance. Check the nfs(5) manual page for your Linux systems to see what options are available. An obvious tunable is 'wsize' which should ideally match (or be a multiple of) the zfs filesystem block size. The /proc/mounts file for my Debian install shows that 1048576 is being used. This is quite large and perhaps a smaller value would help. If you are willing to accept the risk, using the Linux 'async' mount option may make things seem better. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Checksum property change does not change pre-existing data - right?
On 24 Sep 2009, at 03:09, Mark J Musante wrote: On 23 Sep, 2009, at 21.54, Ray Clark wrote: My understanding is that if I "zfs set checksum=" to change the algorithm that this will change the checksum algorithm for all FUTURE data blocks written, but does not in any way change the checksum for previously written data blocks. I need to corroborate this understanding. Could someone please point me to a document that states this? I have searched and searched and cannot find this. I haven't googled for a specific doc, but I can at least tell you that your understanding is correct. If you change the checksum algorithm, that checksum is applied only to future writes. Other properties work similarly, such as compression or copies. I see that the zfs manpage (viewable here: http://docs.sun.com/app/docs/doc/816-5166/zfs-1m?a=view ) only indicates that this is true for the copies property. I guess we'll have to update that doc. It mentions something similar for the recordsize property too: --- Changing the file system's recordsize affects only files created afterward; existing files are unaffected. --- Cheers, Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Checksum property change does not change pre-existing data - right?
On 23 Sep, 2009, at 21.54, Ray Clark wrote: My understanding is that if I "zfs set checksum=" to change the algorithm that this will change the checksum algorithm for all FUTURE data blocks written, but does not in any way change the checksum for previously written data blocks. I need to corroborate this understanding. Could someone please point me to a document that states this? I have searched and searched and cannot find this. I haven't googled for a specific doc, but I can at least tell you that your understanding is correct. If you change the checksum algorithm, that checksum is applied only to future writes. Other properties work similarly, such as compression or copies. I see that the zfs manpage (viewable here: http://docs.sun.com/app/docs/doc/816-5166/zfs-1m? a=view ) only indicates that this is true for the copies property. I guess we'll have to update that doc. Is the word of a zfs developer sufficient? Or do you need to see it in an official piece of documentation? Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Changing the recordsize while replacing a disk
Javier Conde wrote: Hello, Quick question: I have changed the recordsize of an existing file system and I would like to do the conversion while the file system is online. Will a disk replacement change the recordsize of the existing blocks? My idea is to issue a "zpool replace Will this work? No it won't. However all new files which are created will be using a new recordsize. If you want the old files's recordsize to be changed you will have to copy them and remove old version. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Changing the recordsize while replacing a disk
Hello, Quick question: I have changed the recordsize of an existing file system and I would like to do the conversion while the file system is online. Will a disk replacement change the recordsize of the existing blocks? My idea is to issue a "zpool replace Will this work? Thanks in advance, Javi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New to ZFS: One LUN, multiple zones
bertram fukuda wrote: Would I just do the following then: zpool create -f zone1 c1t1d0s0 zfs create zone1/test1 zfs create zone1/test2 Woud I then use zfs set quota=xxxG to handle disk usage? yes ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs copy performance
chris bannayan wrote: I've been comparing zfs send and receive to cp, cpio etc.. for a customer data migration and have found send and receive to be twice as slow as cp or cpio. Did you run sync after the cp/cpio finished to ensure the data really is on disk ? cp and cpio do not do synchronus writes. A zfs recv isn't strictly speaking a synchronus write either but it is much closer to one in how some of the data is written out (note I'm purposely being vague here so I don't have to go into the details of how zfs recv actually works). What else is happening on the recv pool ? What was the exact command line used in all three cases ? How was the time measured ? Were you sending a lot of snapshots as well ? cp/cpio don't know anything about ZFS snapshots (and shouldn't). -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs copy performance
I've been comparing zfs send and receive to cp, cpio etc.. for a customer data migration and have found send and receive to be twice as slow as cp or cpio. Im migrating zfs data from one array to a temporary array on the same server, its 2.3TB in total, and was looking for the fastest way to do this. Is there a recommended way to do this ? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] You're Invited: OpenSolaris Security Summit
Jennifer Bauer Scarpino wrote: To: Developers and Students You are invited to participate in the first OpenSolaris Security Summit OpenSolaris Security Summit Tuesday, November 3rd, 2009 Baltimore Marriott Waterfront 700 Aliceanna Street Baltimore, Maryland 21202 I will be giving a talk and live demo of ZFS Crypto at this event. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Checksum property change does not change pre-existing data - right?
Roch wrote: Bob Friesenhahn writes: > On Wed, 23 Sep 2009, Ray Clark wrote: > > > My understanding is that if I "zfs set checksum=" to > > change the algorithm that this will change the checksum algorithm > > for all FUTURE data blocks written, but does not in any way change > > the checksum for previously written data blocks. > > This is correct. The same applies to blocksize and compression. > With an important distinction. For compression,checksum a block rewrite will affect the next update to any fileblock. For the dataset recordsize property, a block rewrite on an existing multiblock file will not change the file's block size. For Multi-record file's, the recordsize is immutable and dissociated from the dataset recordsize setting. > > I need to corroborate this understanding. Could someone please > > point me to a document that states this? I have searched and > > searched and cannot find this. > Me neither, although it's easy to verify that setting the checksum property on a dataset does not induce the I/O that would be required for a rewrite of the bp. It is mentioned in zfs(1) for the copies property but not for checksum and compression: Changing this property only affects newly-written data. Therefore, set this property at file system creation time by using the -o copies=N option. I've filed a man page bug 6885203 to have similar text added for checksum and compression. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Checksum property change does not change pre-existing data - right?
Bob Friesenhahn writes: > On Wed, 23 Sep 2009, Ray Clark wrote: > > > My understanding is that if I "zfs set checksum=" to > > change the algorithm that this will change the checksum algorithm > > for all FUTURE data blocks written, but does not in any way change > > the checksum for previously written data blocks. > > This is correct. The same applies to blocksize and compression. > With an important distinction. For compression,checksum a block rewrite will affect the next update to any fileblock. For the dataset recordsize property, a block rewrite on an existing multiblock file will not change the file's block size. For Multi-record file's, the recordsize is immutable and dissociated from the dataset recordsize setting. > > I need to corroborate this understanding. Could someone please > > point me to a document that states this? I have searched and > > searched and cannot find this. > Me neither, although it's easy to verify that setting the checksum property on a dataset does not induce the I/O that would be required for a rewrite of the bp. -r > Sorry, I am not aware of a document and don't have time to look. > > Bob > -- > Bob Friesenhahn > bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Sun Flash Accelerator F20
I'm surprised no-one else has posted about this - part of the Sun Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor for cache protection. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml There's no pricing on the webpage though - does anyone know how it compares in price to a logzilla? -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss