Re: [zfs-discuss] recovering from zfs destroy -r

2011-06-27 Thread Roy Sigurd Karlsbakk
- Original Message -
 Hi,
 Is there a simple way of rolling back to a specific TXG of a volume to
 recover from such a situation?
You can't undo a zfs destroy - restore from backup... -- Vennlige hilsener / 
Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net 
http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum 
presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å 
unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste 
tilfeller eksisterer adekvate og relevante synonymer på norsk.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-27 Thread Jim Klimov


- Исходное сообщение -
От: Dave U. Random anonym...@anonymitaet-im-inter.net
Дата: Tuesday, June 21, 2011 18:32
Тема: Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?
Кому (To): zfs-discuss@opensolaris.org

 Hello Jim! I understood ZFS doesn't like slices but from your 
 reply maybe I
 should reconsider. I have a few older servers with 4 bays x 73G. 
 If I make a
 root mirror pool and swap on the other 2 as you suggest, then I 
 would have
 about 63G x 4 left over.


For the sake of completeness, I should mention that you can also
create a fast and redundant 4-way mirrored root pool ;)

 If so then I am back to wondering what 
 to do about
 4 drives. Is raidz1 worthwhile in this scenario? That is less 
 redundancythat a mirror and much less than a 3 way mirror, isn't 
 it? Is it even
 possible to do raidz2 on 4 slices? Or would 2, 2 way mirrors be 
 better? I
 don't understand what RAID10 is, is it simply a stripe of two 
 mirrors? 
Yes, by that I meant a striping over two mirrors.

 Or would it be best to do a 3 way mirror and a hot spare? I would 
 like to be
 able to tolerate losing one drive without loss of integrity.

Any of the scenarios above allow you to lose one drive and not 
lose data immediately. The rest is a compromise between both
performance, space and further redundancy:
* 3- or 4-way mirror: least useable space (25% of total disk capacity),
most redundancy, highest read speeds for concurrent loads
* striping of mirrors (raid10): average useable space (50%), high 
read speeds for concurrent loads, can tolerate loss of up to 2 drives
(slices) in a good scenario (if they are from different mirrors)
* raidz2: average useable space (50%), can tolerate loss of any 2 drives
* raidz1: max useable space (75%), can tolerate loss of any 1 drive
 
After all the discussions about performance recently on this forum,
I would not try to guess which performance would be better in 
general - raidz1 or raidz2 (there are reads, writes, scrubs and 
resilvers seemingly all with different preferences toward disk layout),
but with a generic workload we have (i.e. serving up zones with
some development databases and J2SE app servers) this was not
seen to matter much. So for us it was usually raidz2 for tolerance
or raidz1 for space.
 

 I will be doing new installs of Solaris 10. Is there an option 
 in the
 installer for me to issue ZFS commands and set up pools or do I 
 need to
 format the disks before installing and if so how do I do that? 
 
Unfortunately, I last installed Solaris 10u7 or so from scratch, 
others were liveupdates of existing systems and OpenSolaris 
machines, so I am not certain. 

From what I gather, the text installer is much more powerful
than the graphical one, and its ZFS root setup might encompass 
creating a root pool in a slice of given size, and possibly mirror 
it right away. Maybe you can do likewise in JumpStart, but we 
did not do that after all.
 
Anyhow, after you install a ZFS root of your sufficient size
(i.e. our minimalist Solaris 10 installs are often under 1-2Gb 
per boot environment, multiply for storing different OEs like 
LiveUpdate and for snapshot history), you can create a slice
for the data pool component (s3 in our setups), and then 
clone the disk slice layout to the other 3 drives like this:
#  prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2
(you might need to install the slice table spanning 100% of 
drives with the fdisk command, first).

Then you attach one of the slices to the ZFS root pool to make
a mirror, if the installer did not do that:
# zpool attach rpool c1t0d0s0 c1t1d0s0

If you have several controllers (perhaps even on different PCI buses) 
you might want to pick a drive on a different controller than the first 
one in order to have less SPoF's, but make sure that the second 
controller is bootable from BIOS.

And make that drive bootable:
SPARC:
# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t1d0s0
x86/x86_64:
# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0
 
For two other drives you just create a new pool in slices *s0:
# zpool create swappool mirror c1t2d0s0 c1t3d0s0
# zfs create -V2g swappool/dump
# zfs create -V6g swappool/swap

Sizes are arbitrary here, they depend on your RAM sizing.
You can later add swap from other pools, including a data pool.
Dump device size can be tested by configuring dumpadm to
use the new device - it would either refuse to use a device too 
small (then you recreate it bigger), or accept it.

The installer would probably create a dump and a swap devices
in your root pool, you may elect to destroy them since you have
another swap device, at least.

Make sure to update the /etc/vfstab file to reference the swap 
areas which your system should use further on.

After this is all completed, you can create a data pool in the
s3 slices with your chosen geometry, i.e.
# zpool create pool raidz2 c1t0d0s3 c1t1d0s3 c1t2d0s3 c1t3d0s3

In our setups 

Re: [zfs-discuss] recovering from zfs destroy -r

2011-06-27 Thread Przemyslaw Ceglowski
Those were backups.
What about 
http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script?
 http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script?
I am willing to give it a go.

Thanks,
P


On 27 Jun 2011, at 09:32, Roy Sigurd Karlsbakk 
r...@karlsbakk.netmailto:r...@karlsbakk.net wrote:



Hi,

Is there a simple way of rolling back to a specific TXG of a volume to recover 
from such a situation?
You can't undo a zfs destroy - restore from backup...

--
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.netmailto:r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-27 Thread Jim Klimov
 In this setup that will install everything on the root mirror so 
 I will
 have to move things around later? Like /var and /usr or whatever 
 I don't
 want on the root mirror?
Actually, you do want /usr and much of /var on the root pool, they
are integral parts of the svc:/filesystem/local needed to bring up
your system to a useable state (regardless of whether the other
pools are working or not).
 
Depending on the OS versions, you can do manual data migrations
to separate datasets of the root pool, in order to keep some data
common between OE's or to enforce different quotas or compression
rules. For example, on SXCE and Solaris 10 (but not on oi_148a)
we successfully splice out many filesystems in such a layout
(the example below also illustrates multiple OEs):
 
# zfs list -o name,refer,quota,compressratio,canmount,mountpoint -t filesystem 
-r rpool
NAMEREFER  QUOTA  RATIO  CANMOUNT  MOUNTPOINT
rpool  7.92M   none  1.45xon  /rpool
rpool/ROOT   21K   none  1.38xnoauto  /rpool/ROOT
rpool/ROOT/snv_117  758M   none  1.00xnoauto  /
rpool/ROOT/snv_117/opt 27.1M   none  1.00xnoauto  /opt
rpool/ROOT/snv_117/usr  416M   none  1.00xnoauto  /usr
rpool/ROOT/snv_117/var  122M   none  1.00xnoauto  /var
rpool/ROOT/snv_129  930M   none  1.45xnoauto  /
rpool/ROOT/snv_129/opt  109M   none  2.70xnoauto  /opt
rpool/ROOT/snv_129/usr  509M   none  2.71xnoauto  /usr
rpool/ROOT/snv_129/var  288M   none  2.54xnoauto  /var
rpool/SHARED 18K   none  3.36xnoauto  legacy
rpool/SHARED/var 18K   none  3.36xnoauto  legacy
rpool/SHARED/var/adm   2.97M 5G  4.43xnoauto  legacy
rpool/SHARED/var/cores  118M 5G  3.44xnoauto  legacy
rpool/SHARED/var/crash 1.39G 5G  3.41xnoauto  legacy
rpool/SHARED/var/log102M 5G  3.43xnoauto  legacy
rpool/SHARED/var/mail  66.4M   none  1.79xnoauto  legacy
rpool/SHARED/var/tmp 20K   none  1.00xnoauto  legacy
rpool/test 50.5K   none  1.00xnoauto  /rpool/test
 
Mounts of /var/* components are done via /etc/vfstab lines like:
rpool/SHARED/var/adm-   /var/admzfs -   yes -
rpool/SHARED/var/log-   /var/logzfs -   yes -
rpool/SHARED/var/mail   -   /var/mail   zfs -   yes -
rpool/SHARED/var/crash  -   /var/crash  zfs -   yes -
rpool/SHARED/var/cores  -   /var/cores  zfs -   yes -

While system paths /usr /var /opt are mounted by SMF services
directly.
 
 
 And then I just make a RAID10 like Jim 
 was saying
 with the other 4x60 slices? How should I move mountpoints that aren't
 separate ZFS filesystems?
 
 

 
  The only conclusion you can draw from that is:  First 
 take it as a given
  that you can't boot from a raidz volume.  Given, you must 
 have one mirror.
 
 Thanks, I will keep it in mind.
 
  Then you raidz all the remaining space that's capable of being 
 put into a
  raidz...  And what you have left is a pair of unused 
 space, equal to the
  size of your boot volume.  You either waste that space, 
 or you mirror it
  and put it into your tank.
...or use it as swap space :)
 
 I didn't understand what you suggested about appending a 13G 
 mirror to tank. Would that be something like RAID10 without
 actually being RAID10 so I could still boot from it? How would
 the system use it?
No, this would be an uneven striping over a raid10 (or raidzN) 
bank of 60Gb slices and a 13Gb mirror. ZFS can do that too,
although for performance considerations unbalanced pools are 
not recommended and should be forced on command-line.

And you can not boot from any pool other than a mirror or a
single drive. Rationale: a single BIOS device must be sufficient
to boot the system and contain all the data needed to boot.
 
 So RAID10 sounds like the only reasonable choice since there are 
 an even
 number of slices, I mean is RAIDZ1 even possible with 4 slices?
Yes, it is possible with any amount of slices starting from 3.

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
-- 

++ 
|| 
| Климов Евгений, Jim Klimov | 
| технический директор   CTO | 
| ЗАО ЦОС и ВТ  JSC COSHT | 
|| 
| +7-903-7705859 (cellular)  mailto:jimkli...@cos.ru | 
|CC:ad...@cos.ru,jimkli...@gmail.com | 
++ 
| ()  ascii ribbon 

Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-27 Thread Jim Klimov
 Hello Bob! Thanks for the reply. I was thinking about going with 
 a 3 way
 mirror and a hot spare.
Keep in mind that you can have problems in Sol10u8 if you use
a mirror+spare config for the root pool. Should be fixed in u9.
 
 But I don't think I can upgrade to 
 larger drives
 unless I do it all at once, is that correct?
You can replace the drives one by one, but the pool will only
expand when all the data drives have newer bigger capacity.
 
//Jim
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] replace zil drive

2011-06-27 Thread Carsten John
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello everybody,

some time ago a SSD within a ZIL mirror died. As I had no SSD available
to replace it, I dropped in a normal SAS harddisk to rebuild the mirror.

In the meantime I got the warranty replacement SSD.

Now I'm wondering about the best option to replace the HDD with the SSD:

1. Remove the log mirror, put the new disk in place, add log mirror

2. Pull the HDD, forcing the mirror to fail, replace the HDD with the SSD

Unfortunately I have no free slot in the JBOD available (want to keep
the ZIL in the same JBAD as the rest of the pool):

3. Put additional temporary SAS HDD in free slot of different JBOD,
replace the HDD in the ZIL mirror with temporary HDD, pull now unused
HDD, use free slot for SSD, replace temporary HDD with SSD.



Any suggestions?


thx



Carsten





- -- 
Max Planck Institut fuer marine Mikrobiologie
- - Network Administration -
Celsiustr. 1
D-28359 Bremen
Tel.: +49 421 2028568
Fax.: +49 421 2028565
PGP public key:http://www.mpi-bremen.de/Carsten_John.html
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4ISy8ACgkQsRCwZeehufs9MQCfetuYQwjbqH2Rb7qyY8G4vxaQ
TvUAoNcHPnHED1Ykat8VHF8EJIRiPmct
=jwZQ
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

2011-06-27 Thread Jim Klimov

2011-06-19 3:47, Richard Elling пишет:

On Jun 16, 2011, at 8:05 PM, Daniel Carosone wrote:


On Thu, Jun 16, 2011 at 10:40:25PM -0400, Edward Ned Harvey wrote:

From: Daniel Carosone [mailto:d...@geek.com.au]
Sent: Thursday, June 16, 2011 10:27 PM

Is it still the case, as it once was, that allocating anything other
than whole disks as vdevs forces NCQ / write cache off on the drive
(either or both, forget which, guess write cache)?

I will only say, that regardless of whether or not that is or ever was true,
I believe it's entirely irrelevant.  Because your system performs read and
write caching and buffering in ram, the tiny little ram on the disk can't
possibly contribute anything.

I disagree.  It can vastly help improve the IOPS of the disk and keep
the channel open for more transactions while one is in progress.
Otherwise, the channel is idle, blocked on command completion, while
the heads seek.

Actually, all of the data I've gathered recently shows that the number of
IOPS does not significantly increase for HDDs running random workloads.
However the response time does :-( My data is leading me to want to restrict
the queue depth to 1 or 2 for HDDs.

SDDs are another story, they scale much better in the response time and
IOPS vs queue depth analysis.


Now, is there going to be a tunable which would allow us to set
queue depths per-device? Or tunables are so evil that you'd
rather poke an eye your with a stick? (C) Richard Elling ;)

--


++
||
| Климов Евгений, Jim Klimov |
| технический директор   CTO |
| ЗАО ЦОС и ВТ  JSC COSHT |
||
| +7-903-7705859 (cellular)  mailto:jimkli...@cos.ru |
|  CC:ad...@cos.ru,jimkli...@mail.ru |
++
| ()  ascii ribbon campaign - against html mail  |
| /\- against microsoft attachments  |
++



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fixing txg commit frequency

2011-06-27 Thread Jim Klimov

 I'd like to ask about whether there is a method to enforce a 
 certain txg
 commit frequency on ZFS. 
 
Well, there is a timer frequency based on TXG age (i.e 5 sec 
by default now), in /etc/system like this:
 
set zfs:zfs_txg_synctime = 5

 
Also there is a buffer-size limit, like this (384Mb):
set zfs:zfs_write_limit_override = 0x1800

or on command-line like this:
# echo zfs_write_limit_override/W0t402653184 | mdb -kw

We had similar spikes with big writes to a Thumper with SXCE in the pre-90's
builds, when the system would stall for seconds while flushing a 30-second TXG
full of data. Adding a reasonable megabyte limit solved the unresponsiveness
problem for us, by making these flush-writes rather small and quick.
 
See also:
http://opensolaris.org/jive/thread.jspa?threadID=106453start=15tstart=0
http://opensolaris.org/jive/thread.jspa?messageID=347212
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace zil drive

2011-06-27 Thread Jim Klimov
I think that the least disruptive way would be to detach the HDD from the ZIL
mirror and offline it, remove and replace with an SSD, and then attach the
SSD to the ZIL to make it a mirror again. 
 
Note that this would create a window of possible ZIL failure (and you had
such a window already when the first SSD died), but the system *should* 
survive that (fall back to on-pool ZIL after a short timeout of the dedicated 
device) unless the power dies during this time.

- Исходное сообщение -
От: Carsten John cj...@mpi-bremen.de
Дата: Monday, June 27, 2011 13:21
Тема: [zfs-discuss] replace zil drive
Кому (To): zfs-discuss@opensolaris.org

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Hello everybody,
 
 some time ago a SSD within a ZIL mirror died. As I had no SSD 
 availableto replace it, I dropped in a normal SAS harddisk to 
 rebuild the mirror.
 
 In the meantime I got the warranty replacement SSD.
 
 Now I'm wondering about the best option to replace the HDD with 
 the SSD:
 
 1. Remove the log mirror, put the new disk in place, add log mirror
 
 2. Pull the HDD, forcing the mirror to fail, replace the HDD 
 with the SSD
 
 Unfortunately I have no free slot in the JBOD available (want to keep
 the ZIL in the same JBAD as the rest of the pool):
 
 3. Put additional temporary SAS HDD in free slot of different JBOD,
 replace the HDD in the ZIL mirror with temporary HDD, pull now unused
 HDD, use free slot for SSD, replace temporary HDD with SSD.
 
 
 
 Any suggestions?
 
 
 thx
 
 
 
 Carsten
 
 
 
 
 
 - -- 
 Max Planck Institut fuer marine Mikrobiologie
 - - Network Administration -
 Celsiustr. 1
 D-28359 Bremen
 Tel.: +49 421 2028568
 Fax.: +49 421 2028565
 PGP public key:http://www.mpi-bremen.de/Carsten_John.html
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.10 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
 iEYEARECAAYFAk4ISy8ACgkQsRCwZeehufs9MQCfetuYQwjbqH2Rb7qyY8G4vxaQ
 TvUAoNcHPnHED1Ykat8VHF8EJIRiPmct
 =jwZQ
 -END PGP SIGNATURE-
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
-- 

++ 
|| 
| Климов Евгений, Jim Klimov | 
| технический директор   CTO | 
| ЗАО ЦОС и ВТ  JSC COSHT | 
|| 
| +7-903-7705859 (cellular)  mailto:jimkli...@cos.ru | 
|CC:ad...@cos.ru,jimkli...@gmail.com | 
++ 
| ()  ascii ribbon campaign - against html mail  | 
| /\- against microsoft attachments  | 
++
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Encryption accelerator card recommendations.

2011-06-27 Thread Roberto Waltman

I recently bought an HP Proliant Microserver for a home file server.
( pics and more here:
http://arstechnica.com/civis/viewtopic.php?p=20968192 )

I installed 5 1.5TB (5900 RPM) drives, upgraded the memory to 8GB, and  
installed Solaris 11 Express without a hitch.


A few simple tests using dd with 1gb and 2gb files showed excellent  
transfer rates: ~200 MB/sec on a 5 drive raidz2 pool, ~310 MB/sec on a  
five drive pool with no redundancy.


That is, until I enabled encryption, which brought the transfer rates  
down to around 20 MB/sec...


Obviously the CPU is the bottleneck here, and I?m wondering what to do next.
I can split the storage into file systems with and without encryption  
and allocate data accordingly. No need, for example, to encrypt open  
source code, or music. But I would like to have everything encrypted  
by default.


My concern is not industrial espionage from a hacker in Belarus, but  
having a disk fail and send it for repair with my credit card  
statements easily readable on it, etc.


I am new to (open or closed)Solaris. I found there is something called  
the Encryption Framework, and that there is hardware support for  
encryption.
This server has two unused PCI-e slots, so I thought a card could be  
the solution, but the few I found seem to be geared to protect SSH and  
VPN connections, etc., not the file system.


Cost is a factor also. I could build a similar server with a much  
faster processor for a few hundred dollars more, so a $1000 dollar  
card for a  $1000 file server is not a reasonable option.


Is there anything out there I could use?

Thanks,

Roberto Waltman


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace zil drive

2011-06-27 Thread Roy Sigurd Karlsbakk
I'd guess removing the SLOG altogether might be the safest, so for this 
configuration logs mirror-7 ONLINE 0 0 0 c2t22d0 ONLINE 0 0 0 c2t23d0 ONLINE 0 
0 0 just `zpool remove zwimming mirror-7`, run `cfgadm -a` to find the full 
device path, do `cfgadm -c unconfigure devpath`, replace the drive, run 
devfsadm for good measure, check if it's connected with cfgadm -a, and add the 
SLOG again. roy - Original Message -
 I think that the least disruptive way would be to detach the HDD from
 the ZIL
 mirror and offline it, remove and replace with an SSD, and then attach
 the
 SSD to the ZIL to make it a mirror again.
 Note that this would create a window of possible ZIL failure (and you
 had
 such a window already when the first SSD died), but the system
 *should*
 survive that (fall back to on-pool ZIL after a short timeout of the
 dedicated
 device) unless the power dies during this time.
 - Исходное сообщение -
 От: Carsten John cj...@mpi-bremen.de
 Дата: Monday, June 27, 2011 13:21
 Тема: [zfs-discuss] replace zil drive
 Кому (To): zfs-discuss@opensolaris.org
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  Hello everybody,
 
  some time ago a SSD within a ZIL mirror died. As I had no SSD
  availableto replace it, I dropped in a normal SAS harddisk to
  rebuild the mirror.
 
  In the meantime I got the warranty replacement SSD.
 
  Now I'm wondering about the best option to replace the HDD with
  the SSD:
 
  1. Remove the log mirror, put the new disk in place, add log mirror
 
  2. Pull the HDD, forcing the mirror to fail, replace the HDD
  with the SSD
 
  Unfortunately I have no free slot in the JBOD available (want to
  keep
  the ZIL in the same JBAD as the rest of the pool):
 
  3. Put additional temporary SAS HDD in free slot of different JBOD,
  replace the HDD in the ZIL mirror with temporary HDD, pull now
  unused
  HDD, use free slot for SSD, replace temporary HDD with SSD.
 
 
 
  Any suggestions?
 
 
  thx
 
 
 
  Carsten
 
 
 
 
 
  - --
  Max Planck Institut fuer marine Mikrobiologie
  - - Network Administration -
  Celsiustr. 1
  D-28359 Bremen
  Tel.: +49 421 2028568
  Fax.: +49 421 2028565
  PGP public key:http://www.mpi-bremen.de/Carsten_John.html
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v1.4.10 (GNU/Linux)
  Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
  iEYEARECAAYFAk4ISy8ACgkQsRCwZeehufs9MQCfetuYQwjbqH2Rb7qyY8G4vxaQ
  TvUAoNcHPnHED1Ykat8VHF8EJIRiPmct
  =jwZQ
  -END PGP SIGNATURE-
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 --
 ++
 | |
 | Климов Евгений, Jim Klimov |
 | технический директор CTO |
 | ЗАО ЦОС и ВТ JSC COSHT |
 | |
 | +7-903-7705859 (cellular) mailto:jimkli...@cos.ru |
 | CC:ad...@cos.ru,jimkli...@gmail.com |
 ++
 | () ascii ribbon campaign - against html mail |
 | /\ - against microsoft attachments |
 ++
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 
r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det 
essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ 
for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed 
opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer 
på norsk.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption accelerator card recommendations.

2011-06-27 Thread Nico Williams
IMO a faster processor with built-in AES and other crypto support is
most likely to give you the most bang for your buck, particularly if
you're using closed Solaris 11, as Solaris engineering is likely to
add support for new crypto instructions faster than Illumos (but I
don't really know enough about Illumos to say for sure).

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [solarisx86] Encryption accelerator card recommendations.

2011-06-27 Thread Roberto Waltman

John Martin wrote:

rob_waltman wrote:


I installed 5 1.5TB (5900 RPM) drives, upgraded the memory to 8GB, and
installed Solaris 11 Express without a hitch.

A few simple tests using dd with 1gb and 2gb files showed excellent
transfer rates: ~200 MB/sec on a 5 drive raidz2 pool, ~310 MB/sec on a
five drive pool with no redundancy.


No redundancy?  What does zpool status report?


Sorry, I am not near the machine to try it.
What I meant by no redundancy? is this:

I partitioned the 5 disks identically -
  (a) A 40GB boot partition, (used only in the boot disk, planning to  
mirror it later)

  (b) A 200Gb fast partition
  (c) Two equal size safe partitions filling the rest of the disk  
(~600Gb each?)


Then, (the disks are on c7t0/1/2/3/5, c7t4 is an esata port)

# 1Tb fast pool for temporary storage, work in progress, etc
zpool create ${props} fast2c7t0d0p2 ... c7t5d0p2

zpool create ${props} safe3 raidz2 c7t0d0p3 ... c7t5d0p3
zpool create ${props} safe4 raidz2 c7t0d0p4 ... c7t5d0p4

Where props contains my chosen property defaults: -O utf8only=on  -O  
mountpoint=none  -O atime=off  -O encryption=on ... etc.


Roberto Waltman





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption accelerator card recommendations.

2011-06-27 Thread Erik Trimble

On 6/27/2011 9:55 AM, Roberto Waltman wrote:

I recently bought an HP Proliant Microserver for a home file server.
( pics and more here:
http://arstechnica.com/civis/viewtopic.php?p=20968192 )

I installed 5 1.5TB (5900 RPM) drives, upgraded the memory to 8GB, and 
installed Solaris 11 Express without a hitch.


A few simple tests using dd with 1gb and 2gb files showed excellent 
transfer rates: ~200 MB/sec on a 5 drive raidz2 pool, ~310 MB/sec on a 
five drive pool with no redundancy.


That is, until I enabled encryption, which brought the transfer rates 
down to around 20 MB/sec...


Obviously the CPU is the bottleneck here, and I?m wondering what to do 
next.
I can split the storage into file systems with and without encryption 
and allocate data accordingly. No need, for example, to encrypt open 
source code, or music. But I would like to have everything encrypted 
by default.


My concern is not industrial espionage from a hacker in Belarus, but 
having a disk fail and send it for repair with my credit card 
statements easily readable on it, etc.


I am new to (open or closed)Solaris. I found there is something called 
the Encryption Framework, and that there is hardware support for 
encryption.
This server has two unused PCI-e slots, so I thought a card could be 
the solution, but the few I found seem to be geared to protect SSH and 
VPN connections, etc., not the file system.


Cost is a factor also. I could build a similar server with a much 
faster processor for a few hundred dollars more, so a $1000 dollar 
card for a  $1000 file server is not a reasonable option.


Is there anything out there I could use?

Thanks,

Roberto Waltman 


You're out of luck.  The hardware-encryption device is seen as a small 
market by the vendors, and they price accordingly. All the solutions are 
FIPS-compliant, which makes them non-trivially expensive to 
build/test/verify.


I have yet to see the basic crypto accelerator - which should be as 
simple as an FPGA with downloadable (and updateable) firmware.


The other major cost point is the crypto plugins - sadly, there is no 
way to simply have the CPU farm off crypto jobs to a co-processor. That 
is, there's no way for the CPU to go oh, that looks like I'm running a 
crypto algorithm - I should hand it over to the crypto co-processor.  
Instead, you have to write custom plugin/drivers/libraries for each OS, 
and pray that each OS has some standardized crypto framework. Otherwise, 
you have to link apps with custom libraries.


I'm always kind of surprised that there hasn't been a movement to create 
standardized crypto commands, like the various FP-specific commands that 
are part of MMX/SSE/etc.  That way, most of this could be done in 
hardware seamlessly.




--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption accelerator card recommendations.

2011-06-27 Thread David Magda
On Mon, June 27, 2011 15:24, Erik Trimble wrote:
[...]
 I'm always kind of surprised that there hasn't been a movement to create
 standardized crypto commands, like the various FP-specific commands that
 are part of MMX/SSE/etc.  That way, most of this could be done in
 hardware seamlessly.

The (Ultra)SPARC T-series processors do, but to a certain extent it goes
against a CPU manufacturers best (financial) interest to provide this:
crypto is very CPU intensive using 'regular' instructions, so if you need
to do a lot of it, it would force you to purchase a manufacturer's
top-of-the-line CPUs, and to have as many sockets as you can to handle a
load (and presumably you need to do useful work besides just
en/decrypting traffic).

If you have special instructions that do the operations efficiently, it
means that you're not chewing up cycles as much, so a less powerful (and
cheaper) processor can be purchased.

I'm sure all the Web 2.0 companies would love to have these (and OpenSSL
link use the instructions), so they could simply enable HTTPS for
everything. (Of course it'd also be helpful for data-at-rest, on-disk
encryption as well.)

The last benchmarks I saw indicated that the SPARC T-series could do 45
Gb/s AES or some such, with gobs of RSA operations as well.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption accelerator card recommendations.

2011-06-27 Thread Erik Trimble

On 6/27/2011 1:13 PM, David Magda wrote:

On Mon, June 27, 2011 15:24, Erik Trimble wrote:
[...]

I'm always kind of surprised that there hasn't been a movement to create
standardized crypto commands, like the various FP-specific commands that
are part of MMX/SSE/etc.  That way, most of this could be done in
hardware seamlessly.

The (Ultra)SPARC T-series processors do, but to a certain extent it goes
against a CPU manufacturers best (financial) interest to provide this:
crypto is very CPU intensive using 'regular' instructions, so if you need
to do a lot of it, it would force you to purchase a manufacturer's
top-of-the-line CPUs, and to have as many sockets as you can to handle a
load (and presumably you need to do useful work besides just
en/decrypting traffic).

If you have special instructions that do the operations efficiently, it
means that you're not chewing up cycles as much, so a less powerful (and
cheaper) processor can be purchased.

I'm sure all the Web 2.0 companies would love to have these (and OpenSSL
link use the instructions), so they could simply enable HTTPS for
everything. (Of course it'd also be helpful for data-at-rest, on-disk
encryption as well.)

The last benchmarks I saw indicated that the SPARC T-series could do 45
Gb/s AES or some such, with gobs of RSA operations as well


The T-series crypto isn't what I'm thinking of.  AFAIK, you still need 
to use the Crypto framework in Solaris to access the on-chip 
functionality. Which makes the T-series no different than CPUs without a 
crypto module but a crypto add-in board instead.


What I'm thinking of is something on the lines of what AMD proposed 
awhile ago, in combination with how we used to handle hardware that had 
FP optional.


That is, you continue to make CPUs without any crypto functionality, 
EXCEPT that they support certain extensions a la MMX.   If no Crypto 
accelerator was available, the CPU would trap any Crypto calls, and 
force them to done in software.  You could then stick a crypto 
accellerator in a second CPU socket, and the CPU would recognized this 
was there, and pipe crypto calls to the dedicated co-processor.


Think about how things were done with the i386 and i387.  That's what 
I'm after.  With modern CPU buses like AMD  Intel support, plopping a 
co-processor into another CPU socket would really, really help.



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption accelerator card recommendations.

2011-06-27 Thread David Magda
On Jun 27, 2011, at 17:16, Erik Trimble wrote:

 Think about how things were done with the i386 and i387.  That's what I'm 
 after.  With modern CPU buses like AMD  Intel support, plopping a 
 co-processor into another CPU socket would really, really help.

Given the amount of transistors that are available nowadays I think it'd be 
simpler to just create a series of SIMD instructions right in/on general CPUs, 
and skip the whole  co-processor angle.

There's more and more sensitive data out there, so on-disk crypto could be 
deployed in more places to help prevent data loss (on both servers and 
desktops/laptops), and those systems that don't do disk IO probably do network 
IO, and so would be helped from a TLS/SSL/SSH perspective.

If I were AMD I'd seriously be thinking about this, as it'd help boost volume 
and mindshare for a little while with all the folks doing any kind of web 
activity would pick up kit for HTTPS—at least until Intel brought out a similar 
thing.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption accelerator card recommendations.

2011-06-27 Thread Bill Sommerfeld
On 06/27/11 15:24, David Magda wrote:
 Given the amount of transistors that are available nowadays I think
 it'd be simpler to just create a series of SIMD instructions right
 in/on general CPUs, and skip the whole co-processor angle.

see: http://en.wikipedia.org/wiki/AES_instruction_set

Present in many current Intel CPUs; also expected to be present in AMD's
Bulldozer based CPUs.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace zil drive

2011-06-27 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Carsten John
 
 Now I'm wondering about the best option to replace the HDD with the SSD:

What version of zpool are you running?  If it's = 19, then you could
actually survive a complete ZIL device failure.  So you should simply
offline or detach or whatever the HDD and then either attach or add the new
SDD.  Attach would be mirror, add would be two separate non-mirrored
devices.  Maybe better performance, maybe not.

If it's zpool  19, then you absolutely do not want to degrade to
non-mirrored status.  First attach the new SSD, then when it's done, detach
the HDD.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption accelerator card recommendations.

2011-06-27 Thread David Magda
On Jun 27, 2011, at 18:32, Bill Sommerfeld wrote:

 On 06/27/11 15:24, David Magda wrote:
 Given the amount of transistors that are available nowadays I think
 it'd be simpler to just create a series of SIMD instructions right
 in/on general CPUs, and skip the whole co-processor angle.
 
 see: http://en.wikipedia.org/wiki/AES_instruction_set
 
 Present in many current Intel CPUs; also expected to be present in AMD's
 Bulldozer based CPUs.

Now compare that with the T-series stuff that also handles 3DES, RC4, RSA2048, 
DSA, DH, ECC, MD5, SHA1, SHA2, as well as a hardware RNG:

http://en.wikipedia.org/wiki/UltraSPARC_T2
http://blogs.oracle.com/BestPerf/entry/20100920_sparc_t3_pk11rsaperf

The initial TLS/SSL set up is actually the expensive part (20-58% of the time 
spent of the 'transaction'), and that AES can be performed decently even on 
non-AESNI CPUs: simply adding an RSA accelerator can double performance without 
session caching, and even ~20%  with it. SSL session caching alone can help 
improve throughput by a factor of more than two.

Performance Analysis of TLS Web Servers
http://www.cs.rice.edu/~dwallach/pub/tls-tocs.pdf
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.1403

AESNI is certain better than nothing, but RSA, SHA, and the RNG would be nice 
as well. It'd also be handy for ZFS crypto in addition to all the network IO 
stuff.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption accelerator card recommendations.[GPU acceleration of ZFS]

2011-06-27 Thread Fred Liu
FYI There is another thread named --  GPU acceleration of ZFS in this
list to discuss the possibility to utilize the power of GPGPU.
I posted here:

Good day,

I think ZFS can take advantage of using GPU for sha256 calculation, encryption 
and maybe compression. Modern video card, like 5xxx or 6xxx ATI HD Series can 
do calculation of sha256 50-100 times faster than modern 4 cores CPU.

kgpu project for linux shows nice results.

'zfs scrub' would work freely on high performance ZFS pools.

The only problem that there is no AMD/Nvidia drivers for Solaris that support 
hardware-assisted OpenCL.

Is anyone interested in it?

Best regards,
Anatoly Legkodymov.

On Tue, May 10, 2011 at 11:29 AM, Anatoly legko...@fastmail.fm wrote:
 Good day,

 I think ZFS can take advantage of using GPU for sha256 calculation, 
 encryption and maybe compression. Modern video card, like 5xxx or 6xxx 
 ATI HD Series can do calculation of sha256 50-100 times faster than 
 modern 4 cores CPU.
Ignoring optimizations from SIMD extensions like SSE and friends, this is 
probably true. However, the GPU also has to deal with the overhead of data 
transfer to itself before it can even begin crunching data.
Granted, a Gen. 2 x16 link is quite speedy, but is CPU performance really that 
poor where a GPU can still out-perform it? My undergrad thesis dealt with 
computational acceleration utilizing CUDA, and the datasets had to scale quite 
a ways before there was a noticeable advantage in using a Tesla or similar over 
a bog-standard i7-920.

 The only problem that there is no AMD/Nvidia drivers for Solaris that 
 support hardware-assisted OpenCL.
This, and keep in mind that most of the professional users here will likely be 
using professional hardware, where a simple 8MB Rage XL gets the job done 
thanks to the magic of out-of-band management cards and other such facilities. 
Even as a home user, I have not placed a high-end videocard into my machine, I 
use a $5 ATI PCI videocard that saw about a hour of use whilst I installed 
Solaris 11.

--
--khd

IMHO, zfs need to run in all kind of HW
T-series CMT server that can help sha calculation since T1 day, did not see any 
work in ZFS to take advantage it


On 5/10/2011 11:29 AM, Anatoly wrote:
 Good day,

 I think ZFS can take advantage of using GPU for sha256 calculation, 
 encryption and maybe compression. Modern video card, like 5xxx or 6xxx 
 ATI HD Series can do calculation of sha256 50-100 times faster than 
 modern 4 cores CPU.

 kgpu project for linux shows nice results.

 'zfs scrub' would work freely on high performance ZFS pools.

 The only problem that there is no AMD/Nvidia drivers for Solaris that 
 support hardware-assisted OpenCL.

 Is anyone interested in it?

 Best regards,
 Anatoly Legkodymov.

On Tue, May 10, 2011 at 10:29 PM, Anatoly legko...@fastmail.fm wrote:
 Good day,

 I think ZFS can take advantage of using GPU for sha256 calculation, 
 encryption and maybe compression. Modern video card, like 5xxx or 6xxx 
 ATI HD Series can do calculation of sha256 50-100 times faster than 
 modern 4 cores CPU.

 kgpu project for linux shows nice results.

 'zfs scrub' would work freely on high performance ZFS pools.

 The only problem that there is no AMD/Nvidia drivers for Solaris that 
 support hardware-assisted OpenCL.

 Is anyone interested in it?

This isn't technically true.  The NVIDIA drivers support compute, but there's 
other parts of the toolchain missing.  /* I don't know about ATI/AMD, but I'd 
guess they likely don't support compute across platforms */



/* Disclaimer - The company I work for has a working HMPP compiler for 
Solaris/FreeBSD and we may soon support CUDA */ 

On 10 May 2011, at 16:44, Hung-Sheng Tsao (LaoTsao) Ph. D. wrote:

 
 IMHO, zfs need to run in all kind of HW T-series CMT server that can 
 help sha calculation since T1 day, did not see any work in ZFS to take 
 advantage it

That support would be in the crypto framework though, not ZFS per se. So I 
think the OP might consider how best to add GPU support to the crypto framework.

Chris
___


Thanks.

Fred

 -Original Message-
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of David Magda
 Sent: 星期二, 六月 28, 2011 9:23
 To: Bill Sommerfeld
 Cc: zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] Encryption accelerator card recommendations.
 
 On Jun 27, 2011, at 18:32, Bill Sommerfeld wrote:
 
  On 06/27/11 15:24, David Magda wrote:
  Given the amount of transistors that are available nowadays I think
  it'd be simpler to just create a series of SIMD instructions right
  in/on general CPUs, and skip the whole co-processor angle.
 
  see: http://en.wikipedia.org/wiki/AES_instruction_set
 
  Present in many current Intel CPUs; also expected to be present in
 AMD's
  Bulldozer based CPUs.
 
 Now compare that with the T-series stuff that also handles 3DES, RC4,
 RSA2048, DSA, DH, ECC, MD5, 

Re: [zfs-discuss] Encryption accelerator card recommendations.[GPU acceleration of ZFS]

2011-06-27 Thread David Magda
On Jun 27, 2011, at 22:03, Fred Liu wrote:

 FYI There is another thread named --  GPU acceleration of ZFS in this
 list to discuss the possibility to utilize the power of GPGPU.
 I posted here:

In a similar vein I recently came across SSLShader:

http://shader.kaist.edu/sslshader/
http://www.usenix.org/event/nsdi11/tech/full_papers/Jang.pdf

http://www.google.com/search?q=sslshader

This could be handy for desktops doing ZFS crypto (and even browser SSL and/or 
SSH), but few servers have decent graphics cards (and SPARC systems don't even 
have video ports by :).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption accelerator card recommendations.

2011-06-27 Thread Nico Williams
On Jun 27, 2011 9:24 PM, David Magda dma...@ee.ryerson.ca wrote:
 AESNI is certain better than nothing, but RSA, SHA, and the RNG would be
nice as well. It'd also be handy for ZFS crypto in addition to all the
network IO stuff.

The most important reason for AES-NI might be not performance but defeating
side-channel attacks.

Also, really fast AES HW makes AES-based hash functions quite tempting.

No, AES-NI is nothing to sneeze at.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption accelerator card recommendations.

2011-06-27 Thread Nico Williams
On Jun 27, 2011 4:15 PM, David Magda dma...@ee.ryerson.ca wrote:
 The (Ultra)SPARC T-series processors do, but to a certain extent it goes
 against a CPU manufacturers best (financial) interest to provide this:
 crypto is very CPU intensive using 'regular' instructions, so if you need
 to do a lot of it, it would force you to purchase a manufacturer's
 top-of-the-line CPUs, and to have as many sockets as you can to handle a
 load (and presumably you need to do useful work besides just
 en/decrypting traffic).

I hope no CPU vendor thinks about the economics of chip making that way.  I
actually doubt any do.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss