Re: [zfs-discuss] Persistent errors - do I believe?

2009-09-24 Thread Chris Murray
Cheers, I did try that, but still got the same total on import - 2.73TB

I even thought I might have just made a mistake with the numbers, so I made a 
sort of 'quarter scale model' in VMware and OSOL 2009.06, with 3x250G and 
1x187G. That gave me a size of 744GB, which is *approx* 1/4 of what I get in 
the physical machine. That makes sense. I then replaced the 187 with another 
250, still 744GB total, as expected. Exported & imported - now 996GB. So, the 
export and import process seems to be the thing to do, but why it's not working 
on my physical machine (SXCE119) is a mystery. I even contemplated that there 
might have still been a 750GB drive left in the setup, but they're all 1TB 
(well, 931.51GB).

Any ideas what else it could be?

For anyone interested in the checksum/permanent error thing, I'm running a 
scrub now. 59% done and not one error.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help! System panic when pool imported

2009-09-24 Thread Albert Chin
On Fri, Sep 25, 2009 at 05:21:23AM +, Albert Chin wrote:
> [[ snip snip ]]
> 
> We really need to import this pool. Is there a way around this? We do
> have snv_114 source on the system if we need to make changes to
> usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs
> destroy" transaction never completed and it is being replayed, causing
> the panic. This cycle continues endlessly.

What are the implications of adding the following to /etc/system:
  set zfs:zfs_recover=1
  set aok=1

And importing the pool with:
  # zpool import -o ro

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Help! System panic when pool imported

2009-09-24 Thread Albert Chin
Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a
snapshot a few days ago:
  # zfs snapshot a...@b
  # zfs clone a...@b tank/a
  # zfs clone a...@b tank/b

The system started panicing after I tried:
  # zfs snapshot tank/b...@backup

So, I destroyed tank/b:
  # zfs destroy tank/b
then tried to destroy tank/a
  # zfs destroy tank/a

Now, the system is in an endless panic loop, unable to import the pool
at system startup or with "zpool import". The panic dump is:
  panic[cpu1]/thread=ff0010246c60: assertion failed: 0 == 
zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 == 
0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512

  ff00102468d0 genunix:assfail3+c1 ()
  ff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
  ff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
  ff0010246b10 zfs:dsl_pool_sync+196 ()
  ff0010246ba0 zfs:spa_sync+32a ()
  ff0010246c40 zfs:txg_sync_thread+265 ()
  ff0010246c50 unix:thread_start+8 ()

We really need to import this pool. Is there a way around this? We do
have snv_114 source on the system if we need to make changes to
usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs
destroy" transaction never completed and it is being replayed, causing
the panic. This cycle continues endlessly.

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Persistent errors - do I believe?

2009-09-24 Thread Chris Borrell
Try exporting and reimporting the pool. That has done the trick for me in the 
past
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] periodic slow responsiveness

2009-09-24 Thread James Lever
I thought I would try the same test using dd bs=131072 if=source of=/ 
path/to/nfs to see what the results looked liked…


It is very similar to before, about 2x slog usage and same timing and  
write totals.


Friday, 25 September 2009  1:49:48 PM EST
extended device statistics     
errors ---
r/sw/s   kr/s kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/ 
w trn tot device
0.0 1538.70.0 196834.0  0.0 23.10.0   15.0   2  67   0
0   0   0 c7t2d0
0.0  562.00.0  71942.3  0.0 35.00.0   62.3   1 100   0
0   0   0 c7t2d0
0.0  590.70.0  75614.4  0.0 35.00.0   59.2   1 100   0
0   0   0 c7t2d0
0.0  600.90.0  76920.0  0.0 35.00.0   58.2   1 100   0
0   0   0 c7t2d0
0.0  546.00.0  69887.9  0.0 35.00.0   64.1   1 100   0
0   0   0 c7t2d0
0.0  554.00.0  70913.9  0.0 35.00.0   63.2   1 100   0
0   0   0 c7t2d0
0.0  598.00.0  76549.2  0.0 35.00.0   58.5   1 100   0
0   0   0 c7t2d0
0.0  563.00.0  72065.1  0.0 35.00.0   62.1   1 100   0
0   0   0 c7t2d0
0.0  588.10.0  75282.6  0.0 31.50.0   53.5   1 100   0
0   0   0 c7t2d0
0.0  564.00.0  72195.7  0.0 34.80.0   61.7   1 100   0
0   0   0 c7t2d0
0.0  582.80.0  74599.8  0.0 35.00.0   60.0   1 100   0
0   0   0 c7t2d0
0.0  544.00.0  69633.3  0.0 35.00.0   64.3   1 100   0
0   0   0 c7t2d0
0.0  530.00.0  67191.5  0.0 30.60.0   57.7   0  90   0
0   0   0 c7t2d0


And then the write to primary storage a few seconds later:

Friday, 25 September 2009  1:50:14 PM EST
extended device statistics     
errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w  
trn tot device
0.0  426.30.0 32196.3  0.0 12.70.0   29.8   1  45   0
0   0   0 c11t0d0
0.0  410.40.0 31857.1  0.0 12.40.0   30.3   1  45   0
0   0   0 c11t1d0
0.0  426.30.0 30698.1  0.0 13.00.0   30.5   1  45   0
0   0   0 c11t2d0
0.0  429.30.0 31392.3  0.0 12.60.0   29.4   1  45   0
0   0   0 c11t3d0
0.0  443.20.0 33280.8  0.0 12.90.0   29.1   1  45   0
0   0   0 c11t4d0
0.0  424.30.0 33872.4  0.0 12.70.0   30.0   1  45   0
0   0   0 c11t5d0
0.0  432.30.0 32903.2  0.0 12.60.0   29.2   1  45   0
0   0   0 c11t6d0
0.0  418.30.0 32562.0  0.0 12.50.0   29.9   1  45   0
0   0   0 c11t7d0
0.0  417.30.0 31746.2  0.0 12.40.0   29.8   1  44   0
0   0   0 c11t8d0
0.0  424.30.0 31270.6  0.0 12.70.0   29.9   1  45   0
0   0   0 c11t9d0

Friday, 25 September 2009  1:50:15 PM EST
extended device statistics     
errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w  
trn tot device
0.0  434.90.0 37028.5  0.0 17.30.0   39.7   1  52   0
0   0   0 c11t0d0
1.0  436.9   64.3 37372.1  0.0 17.10.0   39.0   1  51   0
0   0   0 c11t1d0
1.0  442.9   64.3 38543.2  0.0 17.20.0   38.7   1  52   0
0   0   0 c11t2d0
1.0  436.9   64.3 37834.2  0.0 17.30.0   39.6   1  52   0
0   0   0 c11t3d0
1.0  412.8   64.3 35935.0  0.0 16.80.0   40.7   0  52   0
0   0   0 c11t4d0
1.0  413.8   64.3 35342.5  0.0 16.60.0   40.1   0  51   0
0   0   0 c11t5d0
2.0  418.8  128.6 36321.3  0.0 16.50.0   39.3   0  52   0
0   0   0 c11t6d0
1.0  425.8   64.3 36660.4  0.0 16.60.0   39.0   1  51   0
0   0   0 c11t7d0
1.0  437.9   64.3 37484.0  0.0 17.20.0   39.2   1  52   0
0   0   0 c11t8d0
0.0  437.90.0 37968.1  0.0 17.20.0   39.2   1  52   0
0   0   0 c11t9d0


So, 533MB source file, 13 seconds to write to the slog (14 before, no  
appreciable change), 1071.5MB written to the slog, 692.3MB written to  
primary storage.


Just another data point.

cheers,
James

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] periodic slow responsiveness

2009-09-24 Thread James Lever


On 25/09/2009, at 11:49 AM, Bob Friesenhahn wrote:

The commentary says that normally the COMMIT operations occur during  
close(2) or fsync(2) system call, or when encountering memory  
pressure.  If the problem is slow copying of many small files, this  
COMMIT approach does not help very much since very little data is  
sent per file and most time is spent creating directories and files.


The problem appears to be slog bandwidth exhaustion due to all data  
being sent via the slog creating a contention for all following NFS or  
locally synchronous writes.  The NFS writes do not appear to be  
synchronous in nature - there is only a COMMIT being issued at the  
very end, however, all of that data appears to be going via the slog  
and it appears to be inflating to twice its original size.


For a test, I just copied a relatively small file (8.4MB in size).   
Looking at a tcpdump analysis using wireshark, there is a SETATTR  
which ends with a V3 COMMIT and no COMMIT messages during the transfer.


iostat output that matches looks like this:

slog write of the data (17MB appears to hit the slog)

Friday, 25 September 2009  1:01:00 PM EST
extended device statistics     
errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w  
trn tot device
0.0  135.00.0 17154.5  0.0  0.80.06.0   0   3   0
0   0   0 c7t2d0


then a few seconds later, the transaction group gets flushed to  
primary storage writing nearly 11.4MB which is inline with raid Z2  
(expect around 10.5MB; 8.4/8*10):


Friday, 25 September 2009  1:01:13 PM EST
extended device statistics     
errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w  
trn tot device
0.0   91.00.0 1170.4  0.0  0.10.01.3   0   2   0
0   0   0 c11t0d0
0.0   84.00.0 1171.4  0.0  0.10.01.2   0   2   0
0   0   0 c11t1d0
0.0   92.00.0 1172.4  0.0  0.10.01.2   0   2   0
0   0   0 c11t2d0
0.0   84.00.0 1172.4  0.0  0.10.01.3   0   2   0
0   0   0 c11t3d0
0.0   81.00.0 1176.4  0.0  0.10.01.4   0   2   0
0   0   0 c11t4d0
0.0   86.00.0 1176.4  0.0  0.10.01.4   0   2   0
0   0   0 c11t5d0
0.0   89.00.0 1175.4  0.0  0.10.01.4   0   2   0
0   0   0 c11t6d0
0.0   84.00.0 1175.4  0.0  0.10.01.3   0   2   0
0   0   0 c11t7d0
0.0   91.00.0 1168.9  0.0  0.10.01.3   0   2   0
0   0   0 c11t8d0
0.0   89.00.0 1170.9  0.0  0.10.01.4   0   2   0
0   0   0 c11t9d0


So I performed the same test with a much larger file (533MB) to see  
what it would do, being larger than the NVRAM cache in front of the  
SSD.  Note that after the second second of activity the NVRAM is full  
and only allowing in about the sequential write speed of the SSD  
(~70MB/s).


Friday, 25 September 2009  1:13:14 PM EST
extended device statistics     
errors ---
r/sw/s   kr/skw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w  
trn tot device
0.0  640.90.0  81782.9  0.0  4.20.06.5   1  14   0
0   0   0 c7t2d0
0.0 1065.70.0 136408.1  0.0 18.60.0   17.5   1  78   0
0   0   0 c7t2d0
0.0  579.00.0  74113.3  0.0 30.70.0   53.1   1 100   0
0   0   0 c7t2d0
0.0  588.70.0  75357.0  0.0 33.20.0   56.3   1 100   0
0   0   0 c7t2d0
0.0  532.00.0  68096.3  0.0 31.50.0   59.1   1 100   0
0   0   0 c7t2d0
0.0  559.00.0  71428.0  0.0 32.50.0   58.1   1 100   0
0   0   0 c7t2d0
0.0  542.00.0  68755.9  0.0 25.10.0   46.4   1 100   0
0   0   0 c7t2d0
0.0  542.00.0  69376.4  0.0 35.00.0   64.6   1 100   0
0   0   0 c7t2d0
0.0  581.00.0  74368.0  0.0 30.60.0   52.6   1 100   0
0   0   0 c7t2d0
0.0  567.00.0  72574.1  0.0 33.20.0   58.6   1 100   0
0   0   0 c7t2d0
0.0  564.00.0  72194.1  0.0 31.10.0   55.2   1 100   0
0   0   0 c7t2d0
0.0  573.00.0  73343.5  0.0 33.20.0   57.9   1 100   0
0   0   0 c7t2d0
0.0  536.30.0  68640.5  0.0 33.10.0   61.7   1 100   0
0   0   0 c7t2d0
0.0  121.90.0  15608.9  0.0  2.70.0   22.1   0  22   0
0   0   0 c7t2d0


Again, the slog wrote about double the file size (1022.6MB) and a few  
seconds later, the data was pushed to the primary storage (684.9MB  
with an expectation of 666MB = 533MB/8*10) so again about the right  
number hit the spinning platters.


Friday, 25 September 2009  1:13:43 PM EST
extended device statistics     
errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w  
trn tot device
0.0  338.30.0 32794.4  0.0 13.70.0   40.6   1  47   0
0   0   0 c11t0d0
0.0  325.30.0 31399.8  0.0 13.70.0   42.0   1

Re: [zfs-discuss] periodic slow responsiveness

2009-09-24 Thread Bob Friesenhahn

On Fri, 25 Sep 2009, James Lever wrote:


NFS Version 3 introduces the concept of "safe asynchronous writes.?


Being "safe" then requires a responsibilty level on the client which 
is often not present.  For example, if the server crashes, and then 
the client crashes, how does the client resend the uncommitted data? 
If the client had a non-volatile storage cache, then it would be able 
to responsibly finish the writes that failed.


The commentary says that normally the COMMIT operations occur during 
close(2) or fsync(2) system call, or when encountering memory 
pressure.  If the problem is slow copying of many small files, this 
COMMIT approach does not help very much since very little data is sent 
per file and most time is spent creating directories and files.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs copy performance

2009-09-24 Thread chris bannayan
Im measuring time by using the time command in the command arguments

ie: time cp * / etc..

for copy i just used time cp pool/fs1/* /newpool/fs1 etc...

for cpio i used, time find /pool/fs1 |cpio -pdmv /newpool/fs1

for zfs i ran a snapshot fist then, time zfs -R send snapshot| zfs receive -F 
-d newpool

nothing else is running on the pool, Im testing it by just copying multiple 
100MB tar files

I also ran and zfs send -R snapshot > /sec/back, this command took  the same 
time as the cp and cpio, its only when combining the receive that i get slow 
responce.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] periodic slow responsiveness

2009-09-24 Thread James Lever


On 25/09/2009, at 1:24 AM, Bob Friesenhahn wrote:


On Thu, 24 Sep 2009, James Lever wrote:


Is there a way to tune this on the NFS server or clients such that  
when I perform a large synchronous write, the data does not go via  
the slog device?


Synchronous writes are needed by NFS to support its atomic write  
requirement.  It sounds like your SSD is write-bandwidth  
bottlenecked rather than IOPS bottlenecked.  Replacing your SSD with  
a more performant one seems like the first step.


NFS client tunings can make a big difference when it comes to  
performance.  Check the nfs(5) manual page for your Linux systems to  
see what options are available.  An obvious tunable is 'wsize' which  
should ideally match (or be a multiple of) the zfs filesystem block  
size.  The /proc/mounts file for my Debian install shows that  
1048576 is being used.  This is quite large and perhaps a smaller  
value would help.  If you are willing to accept the risk, using the  
Linux 'async' mount option may make things seem better.


From the Linux NFS FAQ.  http://nfs.sourceforge.net/

NFS Version 3 introduces the concept of "safe asynchronous writes.”

And it continues.

My rsize and wsize are negotiating to 1MB.

James

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] periodic slow responsiveness

2009-09-24 Thread James Lever


On 25/09/2009, at 2:58 AM, Richard Elling wrote:


On Sep 23, 2009, at 10:00 PM, James Lever wrote:

So it turns out that the problem is that all writes coming via NFS  
are going through the slog.  When that happens, the transfer speed  
to the device drops to ~70MB/s (the write speed of his SLC SSD) and  
until the load drops all new write requests are blocked causing a  
noticeable delay (which has been observed to be up to 20s, but  
generally only 2-4s).


Thank you sir, can I have another?
If you add (not attach) more slogs, the workload will be spread  
across them.  But...


My log configurations is :

logs
  c7t2d0s0   ONLINE   0 0 0
  c7t3d0s0   OFFLINE  0 0 0

I’m going to test the now removed SSD and see if I can get it to  
perform significantly worse than the first one, but my memory of  
testing these at pre-production testing was that they were both  
equally slow but not significantly different.


On a related note, I had 2 of these devices (both using just 10GB  
partitions) connected as log devices (so the pool had 2 separate  
log devices) and the second one was consistently running  
significantly slower than the first.  Removing the second device  
made an improvement on performance, but did not remove the  
occasional observed pauses.


...this is not surprising, when you add a slow slog device.  This is  
the weakest link rule.


So, in theory, even if one of the two SSDs was even slightly slower  
than the other, it would just appear that it would be more heavily  
effected?


Here is part of what I’m not understanding - unless one SSD is  
significantly worse than the other, how can the following scenario be  
true?  Here is some iostat output from the two slog devices at 1s  
intervals when it gets a large series of write requests.


Idle at start.

0.0 1462.00.0 187010.2  0.0 28.60.0   19.6   2  83   0
0   0   0 c7t2d0
0.0  233.00.0  29823.7  0.0 28.70.0  123.3   0  83   0
0   0   0 c7t3d0


NVRAM cache close to full. (256MB BBC)

0.0   84.00.0 10622.0  0.0  3.50.0   41.2   0  12   0
0   0   0 c7t2d0
0.00.00.0 0.0  0.0 35.00.00.0   0 100   0
0   0   0 c7t3d0


0.00.00.0 0.0  0.0  0.00.00.0   0   0   0
0   0   0 c7t2d0
0.0  305.00.0 39039.3  0.0 35.00.0  114.7   0 100   0
0   0   0 c7t3d0



0.00.00.0 0.0  0.0  0.00.00.0   0   0   0
0   0   0 c7t2d0
0.0  361.00.0 46208.1  0.0 35.00.0   96.8   0 100   0
0   0   0 c7t3d0



0.00.00.0 0.0  0.0  0.00.00.0   0   0   0
0   0   0 c7t2d0
0.0  329.00.0 42114.0  0.0 35.00.0  106.3   0 100   0
0   0   0 c7t3d0


0.00.00.0 0.0  0.0  0.00.00.0   0   0   0
0   0   0 c7t2d0
0.0  317.00.0 40449.6  0.0 27.40.0   86.5   0  85   0
0   0   0 c7t3d0


0.04.00.0   263.8  0.0  0.00.00.2   0   0   0
0   0   0 c7t2d0
0.04.00.0   367.8  0.0  0.00.00.3   0   0   0
0   0   0 c7t3d0


What determines the size of the writes or distribution between slog  
devices?  It looks like ZFS decided to send a large chunk to one slog  
which nearly filled the NVRAM, and then continue writing to the other  
one, which meant that it had to go at device speed (whatever that is  
for the data size/write size).   Is there a way to tune the writes to  
multiple slogs to be (for arguments sake) 10MB slices?


I was of the (mis)understanding that only metadata and writes  
smaller than 64k went via the slog device in the event of an O_SYNC  
write request?


The threshold is 32 kBytes, which is unfortunately the same as the  
default

NFS write size. See CR6686887
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6686887

If you have a slog and logbias=latency (default) then the writes go  
to the slog.
So there is some interaction here that can affect NFS workloads in  
particular.


Interesting CR.

nfsstat -m output on one of the linux hosts (ubuntu)

 Flags:  
rw 
,vers 
= 
3 
,rsize 
= 
1048576 
,wsize 
= 
1048576 
,namlen 
= 
255 
,hard 
,nointr 
,noacl 
,proto 
= 
tcp 
,timeo 
= 
600 
,retrans 
=2,sec=sys,mountaddr=10.1.0.17,mountvers=3,mountproto=tcp,addr=10.1.0.17


rsize and wsize auto tuned to 1MB.  How does this effect the sync  
request threshold?



The clients are (mostly) RHEL5.

Is there a way to tune this on the NFS server or clients such that  
when I perform a large synchronous write, the data does not go via  
the slog device?


You can change the IOP size on the client.



You’re suggesting modifying rsize/wsize?  or something else?

cheers,
James

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cloning Systems using zpool

2009-09-24 Thread Lori Alt

On 09/24/09 15:54, Peter Pickford wrote:

Hi Cindy,

Wouldn't

touch /reconfigure
mv /etc/path_to_inst* /var/tmp/

regenerate all device information?
  
It might, but it's hard to say whether that would accomplish everything 
needed to move a root file system from one system to another.


I just got done modifying flash archive support to work with zfs root on 
Solaris 10 Update 8.  For those not familiar with it, "flash archives" 
are a way to clone full boot environments across multiple machines.  The 
S10 Solaris installer knows how to install one of these flash archives 
on a system and then do all the customizations to adapt it to the  local 
hardware and local network environment.  I'm pretty sure there's more to 
the customization than just a device reconfiguration. 

So feel free to hack together your own solution.  It might work for you, 
but don't assume that you've come up with a completely general way to 
clone root pools.


lori


AFIK zfs doesn't care about the device names it scans for them
it would only affect things like vfstab.

I did a restore from a E2900 to V890 and is seemed to work

Created the pool and zfs recieve.

I would like to be able to have a zfs send of a minimal build and
install it in an abe and activate it.
I tried that is test and it seems to work.

It seems to work but IM just wondering what I may have missed.

I saw someone else has done this on the list and was going to write a blog.

It seems like a good way to get a minimal install on a server with
reduced downtime.

Now if I just knew how to run the installer in and abe without there
being an OS there already that would be cool too.

Thanks

Peter

2009/9/24 Cindy Swearingen :
  

Hi Peter,

I can't provide it because I don't know what it is.

Even if we could provide a list of items, tweaking
the device informaton if the systems are not identical
would be too difficult.

cs

On 09/24/09 12:04, Peter Pickford wrote:


Hi Cindy,

Could you provide a list of system specific info stored in the root pool?

Thanks

Peter

2009/9/24 Cindy Swearingen :
  

Hi Karl,

Manually cloning the root pool is difficult. We have a root pool recovery
procedure that you might be able to apply as long as the
systems are identical. I would not attempt this with LiveUpgrade
and manually tweaking.


http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery

The problem is that the amount system-specific info stored in the root
pool and any kind of device differences might be insurmountable.

Solaris 10 ZFS/flash archive support is available with patches but not
for the Nevada release.

The ZFS team is working on a split-mirrored-pool feature and that might
be an option for future root pool cloning.

If you're still interested in a manual process, see the steps below
attempted by another community member who moved his root pool to a
larger disk on the same system.

This is probably more than you wanted to know...

Cindy



# zpool create -f altrpool c1t1d0s0
# zpool set listsnapshots=on rpool
# SNAPNAME=`date +%Y%m%d`
# zfs snapshot -r rpool/r...@$snapname
# zfs list -t snapshot
# zfs send -R rp...@$snapname | zfs recv -vFd altrpool
# installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk
/dev/rdsk/c1t1d0s0
for x86 do
# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0
Set the bootfs property on the root pool BE.
# zpool set bootfs=altrpool/ROOT/zfsBE altrpool
# zpool export altrpool
# init 5
remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0
-insert solaris10 dvd
ok boot cdrom -s
# zpool import altrpool rpool
# init 0
ok boot disk1

On 09/24/09 10:06, Karl Rossing wrote:


I would like to clone the configuration on a v210 with snv_115.

The current pool looks like this:

-bash-3.2$ /usr/sbin/zpool statuspool: rpool
 state: ONLINE
 scrub: none requested
config:

  NAME  STATE READ WRITE CKSUM
  rpool ONLINE   0 0 0
mirror  ONLINE   0 0 0
  c1t0d0s0  ONLINE   0 0 0
  c1t1d0s0  ONLINE   0 0 0

errors: No known data errors

After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to
/tmp/a so that I can make the changes I need prior to removing the drive
and
putting it into the new v210.

I supose I could lucreate -n new_v210, lumount new_v210, edit what I
need
to, luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0
and
then luactivate the original boot environment.
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensola

Re: [zfs-discuss] Cloning Systems using zpool

2009-09-24 Thread Peter Pickford
Hi Cindy,

Wouldn't

touch /reconfigure
mv /etc/path_to_inst* /var/tmp/

regenerate all device information?

AFIK zfs doesn't care about the device names it scans for them
it would only affect things like vfstab.

I did a restore from a E2900 to V890 and is seemed to work

Created the pool and zfs recieve.

I would like to be able to have a zfs send of a minimal build and
install it in an abe and activate it.
I tried that is test and it seems to work.

It seems to work but IM just wondering what I may have missed.

I saw someone else has done this on the list and was going to write a blog.

It seems like a good way to get a minimal install on a server with
reduced downtime.

Now if I just knew how to run the installer in and abe without there
being an OS there already that would be cool too.

Thanks

Peter

2009/9/24 Cindy Swearingen :
> Hi Peter,
>
> I can't provide it because I don't know what it is.
>
> Even if we could provide a list of items, tweaking
> the device informaton if the systems are not identical
> would be too difficult.
>
> cs
>
> On 09/24/09 12:04, Peter Pickford wrote:
>>
>> Hi Cindy,
>>
>> Could you provide a list of system specific info stored in the root pool?
>>
>> Thanks
>>
>> Peter
>>
>> 2009/9/24 Cindy Swearingen :
>>>
>>> Hi Karl,
>>>
>>> Manually cloning the root pool is difficult. We have a root pool recovery
>>> procedure that you might be able to apply as long as the
>>> systems are identical. I would not attempt this with LiveUpgrade
>>> and manually tweaking.
>>>
>>>
>>> http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery
>>>
>>> The problem is that the amount system-specific info stored in the root
>>> pool and any kind of device differences might be insurmountable.
>>>
>>> Solaris 10 ZFS/flash archive support is available with patches but not
>>> for the Nevada release.
>>>
>>> The ZFS team is working on a split-mirrored-pool feature and that might
>>> be an option for future root pool cloning.
>>>
>>> If you're still interested in a manual process, see the steps below
>>> attempted by another community member who moved his root pool to a
>>> larger disk on the same system.
>>>
>>> This is probably more than you wanted to know...
>>>
>>> Cindy
>>>
>>>
>>>
>>> # zpool create -f altrpool c1t1d0s0
>>> # zpool set listsnapshots=on rpool
>>> # SNAPNAME=`date +%Y%m%d`
>>> # zfs snapshot -r rpool/r...@$snapname
>>> # zfs list -t snapshot
>>> # zfs send -R rp...@$snapname | zfs recv -vFd altrpool
>>> # installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk
>>> /dev/rdsk/c1t1d0s0
>>> for x86 do
>>> # installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0
>>> Set the bootfs property on the root pool BE.
>>> # zpool set bootfs=altrpool/ROOT/zfsBE altrpool
>>> # zpool export altrpool
>>> # init 5
>>> remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0
>>> -insert solaris10 dvd
>>> ok boot cdrom -s
>>> # zpool import altrpool rpool
>>> # init 0
>>> ok boot disk1
>>>
>>> On 09/24/09 10:06, Karl Rossing wrote:

 I would like to clone the configuration on a v210 with snv_115.

 The current pool looks like this:

 -bash-3.2$ /usr/sbin/zpool status    pool: rpool
  state: ONLINE
  scrub: none requested
 config:

       NAME          STATE     READ WRITE CKSUM
       rpool         ONLINE       0     0     0
         mirror      ONLINE       0     0     0
           c1t0d0s0  ONLINE       0     0     0
           c1t1d0s0  ONLINE       0     0     0

 errors: No known data errors

 After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to
 /tmp/a so that I can make the changes I need prior to removing the drive
 and
 putting it into the new v210.

 I supose I could lucreate -n new_v210, lumount new_v210, edit what I
 need
 to, luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0
 and
 then luactivate the original boot environment.
>>>
>>> ___
>>> zfs-discuss mailing list
>>> zfs-discuss@opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-09-24 Thread Trevor Pretty





Oracle use Linux :-( 

But on the positive note have a look at this:-  http://www.youtube.com/watch?v=rmrxN3GWHpM

It's Ed Zander talking to Larry and asking some great questions.

29:45 Ed asks what parts of Sun are you going to keep - all of it!

45:00 Larry's rant on Cloud Computing       "the cloud is water
vapour!"

20:00 Talks about Russell Coutts (a good kiwi bloke) and the America's
cup if you don't care about
anything else. Although they seem confused about who should own it,
Team New Zealand are only letting the Swiss borrow it for a while until
they loose all our top sailors, like Russell and we win it back, once
the trimaran side show is over :-)


Oh and back on topic. Anybody found any info on the F20. I've a
customer who wants to buy one and on the partner portal I can't find
any real details (Just the Facts, or SunIntro, onestop for partner page
would be nice)

Trevor


Enda O'Connor wrote:

  Richard Elling wrote:
  
  
On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote:



  I'm surprised no-one else has posted about this - part of the Sun 
Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 
or 96 GB of SLC, a built-in SAS controller and a super-capacitor for 
cache protection. 
http://www.sun.com/storage/disk_systems/sss/f20/specs.xml
  

At the Exadata-2 announcement, Larry kept saying that it wasn't a disk.  
But there
was little else of a technical nature said, though John did have one to 
show.

RAC doesn't work with ZFS directly, so the details of the configuration 
should prove
interesting.

  
  
isn't exadata based on linux, so not clear where zfs comes into play, 
but I didn't see any of this oracle preso, so could be confused by all this.

Enda
  
  
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

  
  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  










www.eagle.co.nz 
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cloning Systems using zpool

2009-09-24 Thread Cindy Swearingen

Karl,

I'm not sure I'm following everything. If you can't swap the drives,
the which pool would you import?

If you install the new v210 with snv_115, then you would have a bootable 
root pool.


You could then receive the snapshots from the old root pool into the
root pool on the new v210.

I would practice the snapshot/send/recv'ing process if you are not
familiar with it before you attempt the migration.

Cindy

On 09/24/09 12:39, Karl Rossing wrote:

Thanks for the help.

Since the v210's in question are at a remote site. It might be a bit of a pain 
getting the drives swapped by end users.

So I thought of something else. Could I netboot the new v210 with snv_115, use 
zfs send/receive with ssh to grab the data on the old server, install the boot 
block, import the pool, make the changes I need and reboot the system?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-09-24 Thread Brian H. Nelson

Roland Rambau wrote:

Richard, Tim,

yes, one might envision the X4275 as OpenStorage appliances, but
they are not. Exadata 2 is
 - *all* Sun hardware
 - *all* Oracle software (*)
and that combination is now an Oracle product: a database appliance.


Is there any reason the X4275 couldn't be an OpenStorage appliance? It 
seems like it would be a good fit. It doesn't seem specific to Exadata2.


The F20 accelerator card isn't something specific to Exadata2 either is 
it? It looks like something that would benefit any kind of storage 
server. When I saw the F20 on the Sun site the other day, my first 
thought was "Oh cool, they reinvented Prestoserve!"


-Brian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cloning Systems using zpool

2009-09-24 Thread Karl Rossing
Thanks for the help.

Since the v210's in question are at a remote site. It might be a bit of a pain 
getting the drives swapped by end users.

So I thought of something else. Could I netboot the new v210 with snv_115, use 
zfs send/receive with ssh to grab the data on the old server, install the boot 
block, import the pool, make the changes I need and reboot the system?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cloning Systems using zpool

2009-09-24 Thread Cindy Swearingen

Hi Peter,

I can't provide it because I don't know what it is.

Even if we could provide a list of items, tweaking
the device informaton if the systems are not identical
would be too difficult.

cs

On 09/24/09 12:04, Peter Pickford wrote:

Hi Cindy,

Could you provide a list of system specific info stored in the root pool?

Thanks

Peter

2009/9/24 Cindy Swearingen :

Hi Karl,

Manually cloning the root pool is difficult. We have a root pool recovery
procedure that you might be able to apply as long as the
systems are identical. I would not attempt this with LiveUpgrade
and manually tweaking.

http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery

The problem is that the amount system-specific info stored in the root
pool and any kind of device differences might be insurmountable.

Solaris 10 ZFS/flash archive support is available with patches but not
for the Nevada release.

The ZFS team is working on a split-mirrored-pool feature and that might
be an option for future root pool cloning.

If you're still interested in a manual process, see the steps below
attempted by another community member who moved his root pool to a
larger disk on the same system.

This is probably more than you wanted to know...

Cindy



# zpool create -f altrpool c1t1d0s0
# zpool set listsnapshots=on rpool
# SNAPNAME=`date +%Y%m%d`
# zfs snapshot -r rpool/r...@$snapname
# zfs list -t snapshot
# zfs send -R rp...@$snapname | zfs recv -vFd altrpool
# installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk
/dev/rdsk/c1t1d0s0
for x86 do
# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0
Set the bootfs property on the root pool BE.
# zpool set bootfs=altrpool/ROOT/zfsBE altrpool
# zpool export altrpool
# init 5
remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0
-insert solaris10 dvd
ok boot cdrom -s
# zpool import altrpool rpool
# init 0
ok boot disk1

On 09/24/09 10:06, Karl Rossing wrote:

I would like to clone the configuration on a v210 with snv_115.

The current pool looks like this:

-bash-3.2$ /usr/sbin/zpool statuspool: rpool
 state: ONLINE
 scrub: none requested
config:

   NAME  STATE READ WRITE CKSUM
   rpool ONLINE   0 0 0
 mirror  ONLINE   0 0 0
   c1t0d0s0  ONLINE   0 0 0
   c1t1d0s0  ONLINE   0 0 0

errors: No known data errors

After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to
/tmp/a so that I can make the changes I need prior to removing the drive and
putting it into the new v210.

I supose I could lucreate -n new_v210, lumount new_v210, edit what I need
to, luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0 and
then luactivate the original boot environment.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-09-24 Thread Andrew Gabriel

Richard Elling wrote:


On Sep 24, 2009, at 10:17 AM, Tim Cook wrote:




On Thu, Sep 24, 2009 at 12:10 PM, Richard Elling 
 wrote:

On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote:

I'm surprised no-one else has posted about this - part of the Sun 
Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 
or 96 GB of SLC, a built-in SAS controller and a super-capacitor for 
cache protection. 
http://www.sun.com/storage/disk_systems/sss/f20/specs.xml


At the Exadata-2 announcement, Larry kept saying that it wasn't a 
disk. But there
was little else of a technical nature said, though John did have one 
to show.


RAC doesn't work with ZFS directly, so the details of the 
configuration should prove

interesting.
-- richard

Exadata 2 is built on Linux from what I read, so I'm not entirely 
sure how it would leverage ZFS, period. I hope I heard wrong or the 
whole announcement feels like a bit of a joke to me.


It is not clear to me. They speak of "storage servers" which would be 
needed to
implement the shared storage. These are described as Sun Fire X4275 
loaded
with the FlashFire cards. I am not aware of a production-ready Linux 
file system
which implements a hybrid storage pool. I could easily envision these 
as being

OpenStorage appliances.
-- richard 


Well, I'm not an expert on this at all, but what was said IIRC is that 
it is using ASM with the whole lot running on OEL.


These aren't just plain storage servers either. The storage servers are 
provided with enough details of the DB search being performed to do an 
initial filtering of the data so the data returned to the DB servers for 
them to work on is only typically 10% of the raw data they would 
conventionally have to process (and that's before taking compression 
into account).


I haven't seen anything which says exactly how the flash cache is used 
(as in, is it ASM or the database which decides what goes in flash?). 
ASM certainly has the smarts to do this level of tuning for conventional 
disk layout, and just like ZFS, it puts hot data on the outer edge of a 
disk and uses slower parts of disks for less performant data (things 
like backups), so it certainly could decide what goes into flash.


--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cloning Systems using zpool

2009-09-24 Thread Erik Trimble

As Cindy said, this isn't trivial right now.

Personally, I'd do it this way:

ASSUMPTIONS:

*  both v210 machines are reasonably identical (may differ in RAM or CPU 
speed, but nothing much else).

*  Call the original machine A and the new machine B
*  machine B has no current drives in it.

METHOD:

1)  In A,  Install the boot block on c1t1 as Cindy detailed below  
(installboot . )

2) shutdown A
3) remove c1t0 from A (that is, the original boot drive)
4) boot A from c1t1   (you will likely have to do this at the boot prom, 
via something like 'boot disk2' )
5) once A is back up, make the changes you need to make A look like what 
B should be.  Note that ZFS will mark c1t0 as Failed.
6) shutdown A, remove c1t1, and move it to B, putting it in the c1t1 
disk slot (i.e. the 2nd slot)

7) boot B, in the same manner you did A a minute ago (boot disk2)
8) when B is up, insert a new drive into the c1t0 slot, and do a 'zpool 
replace rpool c1t0d0 c1t0d0'

9) after the resilver completes, do an 'installboot' on c1t0
10) reboot B, and everything should be set.
11) on A, re-insert the original c1t0 into it's standard place (i.e. it 
should remain c1t0)

12) boot A
13) insert a fresh drive into the c1t1 slot
14) zpool replace rpool c1t1d0 c1t1d0
15) installboot   after resilver


Note that I've not specifically tried the above, but I can't see any 
reason why it shouldn't work.


-Erik


Cindy Swearingen wrote:

Hi Karl,

Manually cloning the root pool is difficult. We have a root pool 
recovery procedure that you might be able to apply as long as the

systems are identical. I would not attempt this with LiveUpgrade
and manually tweaking.

http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery 



The problem is that the amount system-specific info stored in the root
pool and any kind of device differences might be insurmountable.

Solaris 10 ZFS/flash archive support is available with patches but not
for the Nevada release.

The ZFS team is working on a split-mirrored-pool feature and that might
be an option for future root pool cloning.

If you're still interested in a manual process, see the steps below 
attempted by another community member who moved his root pool to a

larger disk on the same system.

This is probably more than you wanted to know...

Cindy



# zpool create -f altrpool c1t1d0s0
# zpool set listsnapshots=on rpool
# SNAPNAME=`date +%Y%m%d`
# zfs snapshot -r rpool/r...@$snapname
# zfs list -t snapshot
# zfs send -R rp...@$snapname | zfs recv -vFd altrpool
# installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk 
/dev/rdsk/c1t1d0s0

for x86 do
# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0
Set the bootfs property on the root pool BE.
# zpool set bootfs=altrpool/ROOT/zfsBE altrpool
# zpool export altrpool
# init 5
remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0
-insert solaris10 dvd
ok boot cdrom -s
# zpool import altrpool rpool
# init 0
ok boot disk1

On 09/24/09 10:06, Karl Rossing wrote:

I would like to clone the configuration on a v210 with snv_115.

The current pool looks like this:

-bash-3.2$ /usr/sbin/zpool statuspool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0s0  ONLINE   0 0 0
c1t1d0s0  ONLINE   0 0 0

errors: No known data errors

After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 
to /tmp/a so that I can make the changes I need prior to removing the 
drive and putting it into the new v210.


I supose I could lucreate -n new_v210, lumount new_v210, edit what I 
need to, luumount new_v210, luactivate new_v210, zpool detach rpool 
c1t1d0s0 and then luactivate the original boot environment.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-09-24 Thread Roland Rambau

Richard, Tim,

yes, one might envision the X4275 as OpenStorage appliances, but
they are not. Exadata 2 is
 - *all* Sun hardware
 - *all* Oracle software (*)
and that combination is now an Oracle product: a database appliance.

All nodes run Oracles Linux; as far as I understand - and that is not
sooo much - Oracle has offloaded certain database functionality into
the storage nodes. I would not assume that there is a hybrid storage
pool with a file system - it is a distributed data base that knows to
utilize flash storage. I see it as a first quick step.

  hth

  -- Roland

PS: (*) disregarding firmware-like software components like Service
Processor code or IB subnet managers in the IB switches, which are
provided by Sun



Richard Elling schrieb:


On Sep 24, 2009, at 10:17 AM, Tim Cook wrote:




On Thu, Sep 24, 2009 at 12:10 PM, Richard Elling 
 wrote:

On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote:

I'm surprised no-one else has posted about this - part of the Sun 
Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 
or 96 GB of SLC, a built-in SAS controller and a super-capacitor for 
cache protection. 
http://www.sun.com/storage/disk_systems/sss/f20/specs.xml


At the Exadata-2 announcement, Larry kept saying that it wasn't a 
disk.  But there
was little else of a technical nature said, though John did have one 
to show.


RAC doesn't work with ZFS directly, so the details of the 
configuration should prove

interesting.
 -- richard

Exadata 2 is built on Linux from what I read, so I'm not entirely sure 
how it would leverage ZFS, period.  I hope I heard wrong or the whole 
announcement feels like a bit of a joke to me.


It is not clear to me. They speak of "storage servers" which would be 
needed to

implement the shared storage. These are described as Sun Fire X4275 loaded
with the FlashFire cards. I am not aware of a production-ready Linux 
file system
which implements a hybrid storage pool. I could easily envision these as 
being

OpenStorage appliances.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--

**
Roland Rambau Platform Technology Team
Principal Field Technologist  Global Systems Engineering
Phone: +49-89-46008-2520  Mobile:+49-172-84 58 129
Fax:   +49-89-46008-  mailto:roland.ram...@sun.com
**
Sitz der Gesellschaft: Sun Microsystems GmbH,
Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028;  Geschäftsführer:
Thomas Schröder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates:   Martin Häring
*** UNIX * /bin/sh  FORTRAN **
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cloning Systems using zpool

2009-09-24 Thread Peter Pickford
Hi Cindy,

Could you provide a list of system specific info stored in the root pool?

Thanks

Peter

2009/9/24 Cindy Swearingen :
> Hi Karl,
>
> Manually cloning the root pool is difficult. We have a root pool recovery
> procedure that you might be able to apply as long as the
> systems are identical. I would not attempt this with LiveUpgrade
> and manually tweaking.
>
> http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery
>
> The problem is that the amount system-specific info stored in the root
> pool and any kind of device differences might be insurmountable.
>
> Solaris 10 ZFS/flash archive support is available with patches but not
> for the Nevada release.
>
> The ZFS team is working on a split-mirrored-pool feature and that might
> be an option for future root pool cloning.
>
> If you're still interested in a manual process, see the steps below
> attempted by another community member who moved his root pool to a
> larger disk on the same system.
>
> This is probably more than you wanted to know...
>
> Cindy
>
>
>
> # zpool create -f altrpool c1t1d0s0
> # zpool set listsnapshots=on rpool
> # SNAPNAME=`date +%Y%m%d`
> # zfs snapshot -r rpool/r...@$snapname
> # zfs list -t snapshot
> # zfs send -R rp...@$snapname | zfs recv -vFd altrpool
> # installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk
> /dev/rdsk/c1t1d0s0
> for x86 do
> # installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0
> Set the bootfs property on the root pool BE.
> # zpool set bootfs=altrpool/ROOT/zfsBE altrpool
> # zpool export altrpool
> # init 5
> remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0
> -insert solaris10 dvd
> ok boot cdrom -s
> # zpool import altrpool rpool
> # init 0
> ok boot disk1
>
> On 09/24/09 10:06, Karl Rossing wrote:
>>
>> I would like to clone the configuration on a v210 with snv_115.
>>
>> The current pool looks like this:
>>
>> -bash-3.2$ /usr/sbin/zpool status    pool: rpool
>>  state: ONLINE
>>  scrub: none requested
>> config:
>>
>>        NAME          STATE     READ WRITE CKSUM
>>        rpool         ONLINE       0     0     0
>>          mirror      ONLINE       0     0     0
>>            c1t0d0s0  ONLINE       0     0     0
>>            c1t1d0s0  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>> After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to
>> /tmp/a so that I can make the changes I need prior to removing the drive and
>> putting it into the new v210.
>>
>> I supose I could lucreate -n new_v210, lumount new_v210, edit what I need
>> to, luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0 and
>> then luactivate the original boot environment.
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ARC vs Oracle cache

2009-09-24 Thread Enda O'Connor

Richard Elling wrote:

On Sep 24, 2009, at 10:30 AM, Javier Conde wrote:


Hello,

Given the following configuration:

* Server with 12 SPARCVII CPUs  and 96 GB of RAM
* ZFS used as file system for Oracle data
* Oracle 10.2.0.4 with 1.7TB of data and indexes
* 1800 concurrents users with PeopleSoft Financial
* 2 PeopleSoft transactions per day
* HDS USP1100 with LUNs stripped on 6 parity groups (450xRAID7+1), 
total 48 disks

* 2x 4Gbps FC with MPxIO

Which is the best Oracle SGA size to avoid cache duplication between 
Oracle and ZFS?


Is it better to have a "small SGA + big ZFS ARC" or "large SGA + small 
ZFS ARC"?


Who does a better cache for overall performance?


In general, it is better to cache closer to the consumer (application).

You don't mention what version of Solaris or ZFS you are using.
For later versions, the primarycache property allows you to control the
ARC usage on a per-dataset basis.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Hi
addign oracle-interest
I would suggest some testing but standard recommendation to start with
are keep zfs record size is db block size, keep oracle log writer to 
it's own pool ( 128k recordsize is recommended I believe for this one ), 
the log writer is a io limiting factor as such , use latest Ku's for 
solaris as they contain some critical fixes for zfs/oracle, ie 6775697 
for instance.  Small SGA is not usually recommended, but of course a lot 
depends on application layer as well, I can only say test with the 
recommendations above and then deviate from there, perhaps keeping zil 
on separate high latency device might help ( again only analysis can 
determine all that ). Then remember that even after that with a large 
SGA etc, sometimes perf can degrade, ie might need to instruct oracle to 
actually cache, via alter table cache command etc.


getting familiar with statspack aws will be a must here :-) as only an 
analysis of Oracle from an oracle point of view can really tell what is 
workign as such.


Enda


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ARC vs Oracle cache

2009-09-24 Thread Javier Conde


Hi Richard,

Thanks for your reply.

We are using Solaris 10 u6 and ZFS version 10.

Regards,

Javi

Richard Elling wrote:

On Sep 24, 2009, at 10:30 AM, Javier Conde wrote:


Hello,

Given the following configuration:

* Server with 12 SPARCVII CPUs and 96 GB of RAM
* ZFS used as file system for Oracle data
* Oracle 10.2.0.4 with 1.7TB of data and indexes
* 1800 concurrents users with PeopleSoft Financial
* 2 PeopleSoft transactions per day
* HDS USP1100 with LUNs stripped on 6 parity groups (450xRAID7+1), 
total 48 disks

* 2x 4Gbps FC with MPxIO

Which is the best Oracle SGA size to avoid cache duplication between 
Oracle and ZFS?


Is it better to have a "small SGA + big ZFS ARC" or "large SGA + 
small ZFS ARC"?


Who does a better cache for overall performance?


In general, it is better to cache closer to the consumer (application).

You don't mention what version of Solaris or ZFS you are using.
For later versions, the primarycache property allows you to control the
ARC usage on a per-dataset basis.
-- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cloning Systems using zpool

2009-09-24 Thread Cindy Swearingen

Hi Karl,

Manually cloning the root pool is difficult. We have a root pool 
recovery procedure that you might be able to apply as long as the

systems are identical. I would not attempt this with LiveUpgrade
and manually tweaking.

http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery

The problem is that the amount system-specific info stored in the root
pool and any kind of device differences might be insurmountable.

Solaris 10 ZFS/flash archive support is available with patches but not
for the Nevada release.

The ZFS team is working on a split-mirrored-pool feature and that might
be an option for future root pool cloning.

If you're still interested in a manual process, see the steps below 
attempted by another community member who moved his root pool to a

larger disk on the same system.

This is probably more than you wanted to know...

Cindy



# zpool create -f altrpool c1t1d0s0
# zpool set listsnapshots=on rpool
# SNAPNAME=`date +%Y%m%d`
# zfs snapshot -r rpool/r...@$snapname
# zfs list -t snapshot
# zfs send -R rp...@$snapname | zfs recv -vFd altrpool
# installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk 
/dev/rdsk/c1t1d0s0

for x86 do
# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0
Set the bootfs property on the root pool BE.
# zpool set bootfs=altrpool/ROOT/zfsBE altrpool
# zpool export altrpool
# init 5
remove source disk (c1t0d0s0) and move target disk (c1t1d0s0) to slot0
-insert solaris10 dvd
ok boot cdrom -s
# zpool import altrpool rpool
# init 0
ok boot disk1

On 09/24/09 10:06, Karl Rossing wrote:

I would like to clone the configuration on a v210 with snv_115.

The current pool looks like this:

-bash-3.2$ /usr/sbin/zpool status  
  pool: rpool

 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0s0  ONLINE   0 0 0
c1t1d0s0  ONLINE   0 0 0

errors: No known data errors

After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to /tmp/a 
so that I can make the changes I need prior to removing the drive and putting 
it into the new v210.

I supose I could lucreate -n new_v210, lumount new_v210, edit what I need to, 
luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0 and then 
luactivate the original boot environment.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] moving files from one fs to another, splittin/merging

2009-09-24 Thread Mattias Pantzare
> Thanks for the info. Glad to hear it's in the works, too.

It is not in the works. If you look at the bug IDs in the bug database
you will find no indication of work done on them.


>
> Paul
>
>
> 1:21pm, Mark J Musante wrote:
>
>> On Thu, 24 Sep 2009, Paul Archer wrote:
>>
>>> I may have missed something in the docs, but if I have a file in one FS,
>>> and want to move it to another FS (assuming both filesystems are on the same
>>> ZFS pool), is there a way to do it outside of the standard mv/cp/rsync
>>> commands?
>>
>> Not yet.  CR 6483179 covers this.
>>
>>> On a related(?) note, is there a way to split an existing filesystem?
>>
>> Not yet.  CR 6400399 covers this.
>>
>>
>> Regards,
>> markm
>>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-09-24 Thread Richard Elling


On Sep 24, 2009, at 10:17 AM, Tim Cook wrote:




On Thu, Sep 24, 2009 at 12:10 PM, Richard Elling > wrote:

On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote:

I'm surprised no-one else has posted about this - part of the Sun  
Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with  
48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor  
for cache protection. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml


At the Exadata-2 announcement, Larry kept saying that it wasn't a  
disk.  But there
was little else of a technical nature said, though John did have one  
to show.


RAC doesn't work with ZFS directly, so the details of the  
configuration should prove

interesting.
 -- richard

Exadata 2 is built on Linux from what I read, so I'm not entirely  
sure how it would leverage ZFS, period.  I hope I heard wrong or the  
whole announcement feels like a bit of a joke to me.


It is not clear to me. They speak of "storage servers" which would be  
needed to
implement the shared storage. These are described as Sun Fire X4275  
loaded
with the FlashFire cards. I am not aware of a production-ready Linux  
file system
which implements a hybrid storage pool. I could easily envision these  
as being

OpenStorage appliances.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ARC vs Oracle cache

2009-09-24 Thread Richard Elling

On Sep 24, 2009, at 10:30 AM, Javier Conde wrote:


Hello,

Given the following configuration:

* Server with 12 SPARCVII CPUs  and 96 GB of RAM
* ZFS used as file system for Oracle data
* Oracle 10.2.0.4 with 1.7TB of data and indexes
* 1800 concurrents users with PeopleSoft Financial
* 2 PeopleSoft transactions per day
* HDS USP1100 with LUNs stripped on 6 parity groups (450xRAID7+1),  
total 48 disks

* 2x 4Gbps FC with MPxIO

Which is the best Oracle SGA size to avoid cache duplication between  
Oracle and ZFS?


Is it better to have a "small SGA + big ZFS ARC" or "large SGA +  
small ZFS ARC"?


Who does a better cache for overall performance?


In general, it is better to cache closer to the consumer (application).

You don't mention what version of Solaris or ZFS you are using.
For later versions, the primarycache property allows you to control the
ARC usage on a per-dataset basis.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] moving files from one fs to another, splittin/merging

2009-09-24 Thread Paul Archer

Thanks for the info. Glad to hear it's in the works, too.

Paul


1:21pm, Mark J Musante wrote:


On Thu, 24 Sep 2009, Paul Archer wrote:

I may have missed something in the docs, but if I have a file in one FS, 
and want to move it to another FS (assuming both filesystems are on the 
same ZFS pool), is there a way to do it outside of the standard mv/cp/rsync 
commands?


Not yet.  CR 6483179 covers this.


On a related(?) note, is there a way to split an existing filesystem?


Not yet.  CR 6400399 covers this.


Regards,
markm


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



[zfs-discuss] ZFS ARC vs Oracle cache

2009-09-24 Thread Javier Conde


Hello,

Given the following configuration:

* Server with 12 SPARCVII CPUs  and 96 GB of RAM
* ZFS used as file system for Oracle data
* Oracle 10.2.0.4 with 1.7TB of data and indexes
* 1800 concurrents users with PeopleSoft Financial
* 2 PeopleSoft transactions per day
* HDS USP1100 with LUNs stripped on 6 parity groups (450xRAID7+1), total 
48 disks

* 2x 4Gbps FC with MPxIO

Which is the best Oracle SGA size to avoid cache duplication between 
Oracle and ZFS?


Is it better to have a "small SGA + big ZFS ARC" or "large SGA + small 
ZFS ARC"?


Who does a better cache for overall performance?

Thanks in advance and best regards,

Javi
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] moving files from one fs to another, splittin/merging

2009-09-24 Thread Mark J Musante

On Thu, 24 Sep 2009, Paul Archer wrote:

I may have missed something in the docs, but if I have a file in one FS, 
and want to move it to another FS (assuming both filesystems are on the 
same ZFS pool), is there a way to do it outside of the standard 
mv/cp/rsync commands?


Not yet.  CR 6483179 covers this.


On a related(?) note, is there a way to split an existing filesystem?


Not yet.  CR 6400399 covers this.


Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-09-24 Thread Enda O'Connor

Richard Elling wrote:

On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote:

I'm surprised no-one else has posted about this - part of the Sun 
Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 
or 96 GB of SLC, a built-in SAS controller and a super-capacitor for 
cache protection. 
http://www.sun.com/storage/disk_systems/sss/f20/specs.xml


At the Exadata-2 announcement, Larry kept saying that it wasn't a disk.  
But there
was little else of a technical nature said, though John did have one to 
show.


RAC doesn't work with ZFS directly, so the details of the configuration 
should prove

interesting.


isn't exadata based on linux, so not clear where zfs comes into play, 
but I didn't see any of this oracle preso, so could be confused by all this.


Enda

 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-09-24 Thread Tim Cook
On Thu, Sep 24, 2009 at 12:10 PM, Richard Elling
wrote:

> On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote:
>
>  I'm surprised no-one else has posted about this - part of the Sun Oracle
>> Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of
>> SLC, a built-in SAS controller and a super-capacitor for cache protection.
>> http://www.sun.com/storage/disk_systems/sss/f20/specs.xml
>>
>
> At the Exadata-2 announcement, Larry kept saying that it wasn't a disk.
>  But there
> was little else of a technical nature said, though John did have one to
> show.
>
> RAC doesn't work with ZFS directly, so the details of the configuration
> should prove
> interesting.
>  -- richard
>

Exadata 2 is built on Linux from what I read, so I'm not entirely sure how
it would leverage ZFS, period.  I hope I heard wrong or the whole
announcement feels like a bit of a joke to me.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-09-24 Thread Richard Elling

On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote:

I'm surprised no-one else has posted about this - part of the Sun  
Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with  
48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor  
for cache protection. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml


At the Exadata-2 announcement, Larry kept saying that it wasn't a  
disk.  But there
was little else of a technical nature said, though John did have one  
to show.


RAC doesn't work with ZFS directly, so the details of the  
configuration should prove

interesting.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] periodic slow responsiveness

2009-09-24 Thread Richard Elling

comment below...

On Sep 23, 2009, at 10:00 PM, James Lever wrote:


On 08/09/2009, at 2:01 AM, Ross Walker wrote:

On Sep 7, 2009, at 1:32 AM, James Lever  wrote:

Well a MD1000 holds 15 drives a good compromise might be 2 7 drive  
RAIDZ2s with a hotspare... That should provide 320 IOPS instead of  
160, big difference.


The issue is interactive responsiveness and if there is a way to  
tune the system to give that while still having good performance  
for builds when they are run.


Look at the write IOPS of the pool with the zpool iostat -v and  
look at how many are happening on the RAIDZ2 vdev.


I was suggesting that slog write were possibly starving reads from  
the l2arc as they were on the same device.  This appears not to  
have been the issue as the problem has persisted even with the  
l2arc devices removed from the pool.


The SSD will handle a lot more IOPS then the pool and L2ARC is a  
lazy reader, it mostly just holds on to read cache data.


It just may be that the pool configuration just can't handle the  
write IOPS needed and reads are starving.


Possible, but hard to tell.  Have a look at the iostat results  
I’ve posted.


The busy times of the disks while the issue is occurring should let  
you know.


So it turns out that the problem is that all writes coming via NFS  
are going through the slog.  When that happens, the transfer speed  
to the device drops to ~70MB/s (the write speed of his SLC SSD) and  
until the load drops all new write requests are blocked causing a  
noticeable delay (which has been observed to be up to 20s, but  
generally only 2-4s).


Thank you sir, can I have another?
If you add (not attach) more slogs, the workload will be spread across  
them.  But...




I can reproduce this behaviour by copying a large file (hundreds of  
MB in size) using 'cp src dst’ on an NFS (still currently v3) client  
and observe that all data is pushed through the slog device (10GB  
partition of a Samsung 50GB SSD behind a PERC 6/i w/256MB BBC)  
rather than going direct to the primary storage disks.


On a related note, I had 2 of these devices (both using just 10GB  
partitions) connected as log devices (so the pool had 2 separate log  
devices) and the second one was consistently running significantly  
slower than the first.  Removing the second device made an  
improvement on performance, but did not remove the occasional  
observed pauses.


...this is not surprising, when you add a slow slog device.  This is  
the weakest link rule.


I was of the (mis)understanding that only metadata and writes  
smaller than 64k went via the slog device in the event of an O_SYNC  
write request?


The threshold is 32 kBytes, which is unfortunately the same as the  
default

NFS write size. See CR6686887
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6686887

If you have a slog and logbias=latency (default) then the writes go to  
the slog.
So there is some interaction here that can affect NFS workloads in  
particular.




The clients are (mostly) RHEL5.

Is there a way to tune this on the NFS server or clients such that  
when I perform a large synchronous write, the data does not go via  
the slog device?


You can change the IOP size on the client.
 -- richard



I have investigated using the logbias setting, but that will just  
kill small file performance also on any filesystem using it and  
defeat the purpose of having a slog device at all.


cheers,
James

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] You're Invited: OpenSolaris Security Summit

2009-09-24 Thread Richard Elling

On Sep 24, 2009, at 2:19 AM, Darren J Moffat wrote:


Jennifer Bauer Scarpino wrote:

To: Developers and Students
You are invited to participate in the first OpenSolaris Security  
Summit

OpenSolaris Security Summit
Tuesday, November 3rd, 2009
Baltimore Marriott Waterfront
700 Aliceanna Street
Baltimore, Maryland 21202


I will be giving a talk and live demo of ZFS Crypto at this event.


Other, related tutorials at the same conference are:
	+ Sunday: Jim Mauro's Solaris Dynamic Tracing (DTrace): Finding the  
Light

Where There Was Only Darkness

	+ Monday: Jim Mauro's Solaris 10 Performance, Observability, and  
Debugging


+ Monday: Richard Elling's ZFS: A Filesystem for Modern Hardware

	+ Tuesday: Peter Galvin & Marc Stavely have two half-day tutorials  
(conflict

with the summit)
+ Solaris 10 Administration Workshop 1: Administration 
(Hands-on)
+ Solaris 10 Administration Workshop 2: Virtualization 
(Hands-on)

	+ Wednesday: Peter Galvin & Marc Stavely have two more half-day  
tutorials

+ Solaris 10 Administration Workshop 3: File Systems (Hands-on)
+ Solaris 10 Administration Workshop 4: Security (Hands-on)

+ Thursday: Jeff Victor's Resource Management with Solaris Containers

And, of course, there are always lots of good technical papers at LISA.
http://www.usenix.org/events/lisa09

 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Cloning Systems using zpool

2009-09-24 Thread Karl Rossing
I would like to clone the configuration on a v210 with snv_115.

The current pool looks like this:

-bash-3.2$ /usr/sbin/zpool status  
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0s0  ONLINE   0 0 0
c1t1d0s0  ONLINE   0 0 0

errors: No known data errors

After I run zpool detach rpool c1t1d0s0, how can I remount c1t1d0s0 to /tmp/a 
so that I can make the changes I need prior to removing the drive and putting 
it into the new v210.

I supose I could lucreate -n new_v210, lumount new_v210, edit what I need to, 
luumount new_v210, luactivate new_v210, zpool detach rpool c1t1d0s0 and then 
luactivate the original boot environment.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] moving files from one fs to another, splittin/merging

2009-09-24 Thread Paul Archer
I may have missed something in the docs, but if I have a file in one FS, 
and want to move it to another FS (assuming both filesystems are on the 
same ZFS pool), is there a way to do it outside of the standard 
mv/cp/rsync commands? For example, I have a pool with my home directory as 
a FS, and I have another FS with ISOs. I download an ISO of an OpenSolaris 
DVD (say, 3GB), but it goes into my home directory. Since ZFS is all about 
pools and shared storage, it seems like it would be natural to move the 
file vi a 'zfs' command, rather mv/cp/etc...


On a related(?) note, is there a way to split an existing filesystem? To 
use the example above, let's say I have an ISO directory in my home 
directory, but it's getting big, plus I'd like to share it out on my 
network. Is there a way to split my home directory's FS, so that the ISO 
directory becomes its own FS?


Paul Archer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] periodic slow responsiveness

2009-09-24 Thread Bob Friesenhahn

On Thu, 24 Sep 2009, James Lever wrote:


I was of the (mis)understanding that only metadata and writes smaller than 
64k went via the slog device in the event of an O_SYNC write request?


What would cause you to understand that?

Is there a way to tune this on the NFS server or clients such that when I 
perform a large synchronous write, the data does not go via the slog device?


Synchronous writes are needed by NFS to support its atomic write 
requirement.  It sounds like your SSD is write-bandwidth bottlenecked 
rather than IOPS bottlenecked.  Replacing your SSD with a more 
performant one seems like the first step.


NFS client tunings can make a big difference when it comes to 
performance.  Check the nfs(5) manual page for your Linux systems to 
see what options are available.  An obvious tunable is 'wsize' which 
should ideally match (or be a multiple of) the zfs filesystem block 
size.  The /proc/mounts file for my Debian install shows that 1048576 
is being used.  This is quite large and perhaps a smaller value would 
help.  If you are willing to accept the risk, using the Linux 'async' 
mount option may make things seem better.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Checksum property change does not change pre-existing data - right?

2009-09-24 Thread Chris Ridd


On 24 Sep 2009, at 03:09, Mark J Musante wrote:



On 23 Sep, 2009, at 21.54, Ray Clark wrote:

My understanding is that if I "zfs set checksum=" to  
change the algorithm that this will change the checksum algorithm  
for all FUTURE data blocks written, but does not in any way change  
the checksum for previously written data blocks.


I need to corroborate this understanding.  Could someone please  
point me to a document that states this?  I have searched and  
searched and cannot find this.


I haven't googled for a specific doc, but I can at least tell you  
that your understanding is correct.  If you change the checksum  
algorithm, that checksum is applied only to future writes.  Other  
properties work similarly, such as compression or copies.  I see  
that the zfs manpage (viewable here: http://docs.sun.com/app/docs/doc/816-5166/zfs-1m?a=view 
 ) only indicates that this is true for the copies property.  I  
guess we'll have to update that doc.


It mentions something similar for the recordsize property too:

---
 Changing the file system's recordsize affects only files
 created afterward; existing files are unaffected.
---

Cheers,

Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Checksum property change does not change pre-existing data - right?

2009-09-24 Thread Mark J Musante


On 23 Sep, 2009, at 21.54, Ray Clark wrote:

My understanding is that if I "zfs set checksum=" to  
change the algorithm that this will change the checksum algorithm  
for all FUTURE data blocks written, but does not in any way change  
the checksum for previously written data blocks.


I need to corroborate this understanding.  Could someone please  
point me to a document that states this?  I have searched and  
searched and cannot find this.


I haven't googled for a specific doc, but I can at least tell you that  
your understanding is correct.  If you change the checksum algorithm,  
that checksum is applied only to future writes.  Other properties work  
similarly, such as compression or copies.  I see that the zfs manpage  
(viewable here: http://docs.sun.com/app/docs/doc/816-5166/zfs-1m? 
a=view ) only indicates that this is true for the copies property.  I  
guess we'll have to update that doc.


Is the word of a zfs developer sufficient?  Or do you need to see it  
in an official piece of documentation?



Regards,
markm


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Changing the recordsize while replacing a disk

2009-09-24 Thread Robert Milkowski

Javier Conde wrote:


Hello,

Quick question:

I have changed the recordsize of an existing file system and I would 
like to do the conversion while the file system is online.


Will a disk replacement change the recordsize of the existing blocks?

My idea is to issue a "zpool replace   

Will this work?




No it won't.

However all new files which are created will be using a new recordsize.
If you want the old files's recordsize to be changed you will have to 
copy them and remove old version.




--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Changing the recordsize while replacing a disk

2009-09-24 Thread Javier Conde


Hello,

Quick question:

I have changed the recordsize of an existing file system and I would 
like to do the conversion while the file system is online.


Will a disk replacement change the recordsize of the existing blocks?

My idea is to issue a "zpool replace   

Will this work?

Thanks in advance,

Javi
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New to ZFS: One LUN, multiple zones

2009-09-24 Thread Robert Milkowski

bertram fukuda wrote:

Would I just do the following then:

  

zpool create -f zone1 c1t1d0s0
zfs create zone1/test1
zfs create zone1/test2



Woud I then use zfs set quota=xxxG to handle disk usage?
  

yes

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs copy performance

2009-09-24 Thread Darren J Moffat

chris bannayan wrote:

I've been comparing zfs send and receive to cp, cpio etc.. for a customer data 
migration
and have found send and receive to be twice as slow as cp or cpio.


Did you run sync after the cp/cpio finished to ensure the data really is 
on disk ?  cp and cpio do not do synchronus writes.  A zfs recv isn't 
strictly speaking a synchronus write either but it is much closer to one 
 in how some of the data is written out (note I'm purposely being vague 
here so I don't have to go into the details of how zfs recv actually works).


What else is happening on the recv pool ?

What was the exact command line used in all three cases ?
How was the time measured ?

Were you sending a lot of snapshots as well ?  cp/cpio don't know 
anything about ZFS snapshots (and shouldn't).


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs copy performance

2009-09-24 Thread chris bannayan
I've been comparing zfs send and receive to cp, cpio etc.. for a customer data 
migration
and have found send and receive to be twice as slow as cp or cpio.

Im migrating zfs data from one array to a temporary array on the same server, 
its 2.3TB in total, and was looking for the fastest way to do this. Is there a 
recommended way to do this ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] You're Invited: OpenSolaris Security Summit

2009-09-24 Thread Darren J Moffat

Jennifer Bauer Scarpino wrote:



To: Developers and Students

You are invited to participate in the first OpenSolaris Security Summit


OpenSolaris Security Summit
Tuesday, November 3rd, 2009
Baltimore Marriott Waterfront
700 Aliceanna Street
Baltimore, Maryland 21202


I will be giving a talk and live demo of ZFS Crypto at this event.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Checksum property change does not change pre-existing data - right?

2009-09-24 Thread Darren J Moffat

Roch wrote:

Bob Friesenhahn writes:
 > On Wed, 23 Sep 2009, Ray Clark wrote:
 > 
 > > My understanding is that if I "zfs set checksum=" to 
 > > change the algorithm that this will change the checksum algorithm 
 > > for all FUTURE data blocks written, but does not in any way change 
 > > the checksum for previously written data blocks.
 > 
 > This is correct. The same applies to blocksize and compression.
 > 


With an important distinction. For compression,checksum a
block rewrite will affect the next update to any fileblock.

For the dataset recordsize property, a block rewrite on an
existing multiblock file will not change the file's block
size. For Multi-record file's, the recordsize is immutable
and dissociated from the dataset recordsize setting.

 > > I need to corroborate this understanding.  Could someone please 
 > > point me to a document that states this?  I have searched and 
 > > searched and cannot find this.
 > 


Me neither, although it's easy to verify that setting the
checksum property on a dataset does not induce the I/O that
would be required for a rewrite of the bp.


It is mentioned in zfs(1) for the copies property but not for checksum 
and compression:


 Changing this property only affects newly-written  data.
 Therefore,  set  this  property  at file system creation
 time by using the -o copies=N option.

I've filed a man page bug 6885203 to have similar text added for
checksum and compression.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Checksum property change does not change pre-existing data - right?

2009-09-24 Thread Roch
Bob Friesenhahn writes:
 > On Wed, 23 Sep 2009, Ray Clark wrote:
 > 
 > > My understanding is that if I "zfs set checksum=" to 
 > > change the algorithm that this will change the checksum algorithm 
 > > for all FUTURE data blocks written, but does not in any way change 
 > > the checksum for previously written data blocks.
 > 
 > This is correct. The same applies to blocksize and compression.
 > 

With an important distinction. For compression,checksum a
block rewrite will affect the next update to any fileblock.

For the dataset recordsize property, a block rewrite on an
existing multiblock file will not change the file's block
size. For Multi-record file's, the recordsize is immutable
and dissociated from the dataset recordsize setting.

 > > I need to corroborate this understanding.  Could someone please 
 > > point me to a document that states this?  I have searched and 
 > > searched and cannot find this.
 > 

Me neither, although it's easy to verify that setting the
checksum property on a dataset does not induce the I/O that
would be required for a rewrite of the bp.

-r

 > Sorry, I am not aware of a document and don't have time to look.
 > 
 > Bob
 > --
 > Bob Friesenhahn
 > bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 > GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Sun Flash Accelerator F20

2009-09-24 Thread James Andrewartha
I'm surprised no-one else has posted about this - part of the Sun Oracle 
Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of 
SLC, a built-in SAS controller and a super-capacitor for cache protection. 
http://www.sun.com/storage/disk_systems/sss/f20/specs.xml


There's no pricing on the webpage though - does anyone know how it compares 
in price to a logzilla?


--
James Andrewartha
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss