from:"michael"


On Mon, 16 Jul 2012, Edward Ned Harvey wrote:


From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Michael Hase

got some strange results, please see
attachements for exact numbers and pool config:

   seq write  factor   seq read  factor
   MB/sec  MB/sec
single1231135   1
raid0 1141249   2
mirror 570.5  129   1


I agree with you these look wrong.  Here is what you should expect:

seq W   seq R
single  1.0 1.0
stripe  2.0 2.0
mirror  1.0 2.0

You have three things wrong:
(a) stripe should write 2x
(b) mirror should write 1x
(c) mirror should read 2x

I would have simply said "for some reason your drives are unable to operate
concurrently" but you have the stripe read 2x.

I cannot think of a single reason that the stripe should be able to read 2x,
and the mirror only 1x.



Yes, I think so too. In the meantime I switched the two disks to another 
box (hp xw8400, 2 xeon 5150 cpus, 16gb ram). On this machine I did the 
previous sas tests. OS is now OpenIndiana 151a (vs OpenSolaris b130 
before), the mirror pool was upgraded from version 22 to 28, the raid0 
pool newly created. The results look quite different:


  seq write  factor   seq read  factor
  MB/sec  MB/sec
raid0 2362330   2.5
mirror1111128   1

Now the raid0 case shows excellent performance, the 330 MB/sec are a bit 
on the optimistic side, maybe some arc cache effects (file size 32gb, 16gb 
ram). iostat during sequential read shows about 115 MB/sec from each disk, 
which is great.


The (really desired) mirror case still has a problem with sequential 
reads. sequential writes to the mirror are twice as fast as before, and 
show the expected performance for a single disk.


So only one thing left: mirror should read 2x

I suspect the difference is not the hardware, both boxess should have 
enough horsepower to easily do sequential reads with way more than 200 
MB/sec. In all tests cpu time (user and system) remained quite low. I 
think it's an OS issue: OpenSolaris b130 is over 2 years old, OI 151a 
dates 11/2011.


Could someone please send me some bonnie++ results for a 2 disk mirror or 
a 2x2 disk mirror pool with sata disks?


Michael

--
Michael Hase
http://edition-software.de
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs sata mirror slower than single disk

On Mon, 16 Jul 2012, Bob Friesenhahn wrote:

On Mon, 16 Jul 2012, Michael Hase wrote:

This is my understanding of zfs: it should load balance read requests even
for a single sequential reader. zfs_prefetch_disable is the default 0. And
I can see exactly this scaling behaviour with sas disks and with scsi
disks, just not on this sata pool.

Is the BIOS configured to use AHCI mode or is it using IDE mode?

Not relevant here, disks are connected to an onboard sas hba (lsi 1068,
see first post), hardware is a primergy rx330 with 2 qc opterons.

Are the disks 512 byte/sector or 4K?

512 byte/sector, HDS721010CLA330

Maybe it's a corner case which doesn't matter in real world applications?
The random seek values in my bonnie output show the expected performance
boost when going from one disk to a mirrored configuration. It's just the
sequential read/write case, that's different for sata and sas disks.

I don't have a whole lot of experience with SATA disks but it is my
impression that you might see this sort of performance if the BIOS was
configured so that the drives were used as IDE disks. If not that, then
there must be a bottleneck in your hardware somewhere.

With early nevada releases I had indeed the IDE/AHCI problem, albeit on
different hardware. Solaris only ran in IDE mode, disks were 4 times
slower than on linux, see
http://www.oracle.com/webfolder/technetwork/hcl/data/components/details/intel/sol_10_05_08/2999.html

Wouldn't a hardware bottleneck show up on raw dd tests as well? I can
stream > 130 MB/sec from each of the two disks in parallel. dd reading
from more than these two disks at the same time results in a slight
slowdown, but here we talk about nearly 400 MB/sec aggregated bandwidth
through the onboard hba, the box has 6 disk slots:

extended device statistics
r/sw/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
94.50.0 94.50.0 0.0 1.00.0 10.5 0 100 c13t6d0
94.50.0 94.50.0 0.0 1.00.0 10.6 0 100 c13t1d0
93.00.0 93.00.0 0.0 1.00.0 10.7 0 100 c13t2d0
94.50.0 94.50.0 0.0 1.00.0 10.5 0 100 c13t5d0

Don't know why this is a bit slower, maybe some pci-e bottleneck. Or
something with the mpt driver, intrstat shows only one cpu handles all mpt
interrupts. Or even the slow cpus? These are 1.8ghz opterons.

During sequential reads from the zfs mirror I see > 1000 interrupts/sec on
one cpu. So it could really be a bottleneck somewhere triggerd by the
"smallish" 128k i/o requests from the zfs side. I think I'll benchmark
again on a xeon box with faster cpus, my tests with sas disks were done on
this other box.

Michael

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs sata mirror slower than single disk


On Mon, 16 Jul 2012, Bob Friesenhahn wrote:


On Mon, 16 Jul 2012, Stefan Ring wrote:


It is normal for reads from mirrors to be faster than for a single disk
because reads can be scheduled from either disk, with different I/Os being
handled in parallel.


That assumes that there *are* outstanding requests to be scheduled in
parallel, which would only happen with multiple readers or a large
read-ahead buffer.


That is true.  Zfs tries to detect the case of sequential reads and requests 
to read more data than the application has already requested. In this case 
the data may be prefetched from the other disk before the application has 
requested it.


This is my understanding of zfs: it should load balance read requests even 
for a single sequential reader. zfs_prefetch_disable is the default 0. And 
I can see exactly this scaling behaviour with sas disks and with scsi 
disks, just not on this sata pool.


zfs_vdev_max_pending is already tuned down to 3 as recommended for sata 
disks, iostat -Mxnz 2 looks something like


r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
  507.10.0   63.40.0  0.0  2.90.05.8   1  99 c13t5d0
  477.60.0   59.70.0  0.0  2.80.05.8   1  94 c13t4d0

when reading from the zfs mirror. The default zfs_vdev_max_pending=10 
leads to much higher service times in the 20-30msec range, throughput 
remains roughly the same.


I can read from the dsk or rdsk devices in parallel with real platter 
speeds:


dd if=/dev/dsk/c13t4d0s0 of=/dev/null bs=1024k count=8192 &
dd if=/dev/dsk/c13t5d0s0 of=/dev/null bs=1024k count=8192 &

extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 2467.50.0  134.90.0  0.0  0.90.00.4   1  87 c13t5d0
 2546.50.0  139.30.0  0.0  0.80.00.3   1  84 c13t4d0

So I think there is no problem with the disks.

Maybe it's a corner case which doesn't matter in real world applications? 
The random seek values in my bonnie output show the expected performance 
boost when going from one disk to a mirrored configuration. It's just the 
sequential read/write case, that's different for sata and sas disks.


Michael



Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs sata mirror slower than single disk


Hello list,

did some bonnie++ benchmarks for different zpool configurations
consisting of one or two 1tb sata disks (hitachi hds721010cla332, 512
bytes/sector, 7.2k), and got some strange results, please see
attachements for exact numbers and pool config:

  seq write  factor   seq read  factor
  MB/sec  MB/sec
single1231135   1
raid0 1141249   2
mirror 570.5  129   1

Each of the disks is capable of about 135 MB/sec sequential reads and
about 120 MB/sec sequential writes, iostat -En shows no defects. Disks
are 100% busy in all tests, and show normal service times. This is on
opensolaris 130b, rebooting with openindiana 151a live cd gives the
same results, dd tests give the same results, too. Storage controller
is an lsi 1068 using mpt driver. The pools are newly created and
empty. atime on/off doesn't make a difference.

Is there an explanation why

1) in the raid0 case the write speed is more or less the same as a
single disk.

2) in the mirror case the write speed is cut by half, and the read
speed is the same as a single disk. I'd expect about twice the
performance for both reading and writing, maybe a bit less, but
definitely more than measured.

For comparison I did the same tests with 2 old 2.5" 36gb sas 10k disks
maxing out at about 50-60 MB/sec on the outer tracks.

  seq write  factor   seq read  factor
  MB/sec  MB/sec
single 381 50   1
raid0  892111   2
mirror 361 92   2

Here we get the expected behaviour: raid0 with about double the
performance for reading and writing, mirror about the same performance
for writing, and double the speed for reading, compared to a single
disk. An old scsi system with 4x2 mirror pairs also shows these
scaling characteristics, about 450-500 MB/sec seq read and 250 MB/sec
write, each disk capable of 80 MB/sec. I don't care about absolute
numbers, just don't get why the sata system is so much slower than
expected, especially for a simple mirror. Any ideas?

Thanks,
Michael

--
Michael Hase
http://edition-software.de  pool: ptest
 state: ONLINE
  scan: none requested
config:

NAMESTATE READ WRITE CKSUM
ptest   ONLINE   0 0 0
  c13t4d0   ONLINE   0 0 0

Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfssingle   32G79  98 123866  51 63626  35   255  99 135359  25 530.6  
13
Latency   333ms 111ms5283ms   73791us 465ms2535ms
Version  1.96   --Sequential Create-- Random Create
zfssingle   -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16  4536  40 + +++ 14140  50 10382  69 + +++  6260  73
Latency 21655us 154us 206us   24539us  46us 405us
1.96,1.96,zfssingle,1,1342165334,32G,,79,98,123866,51,63626,35,255,99,135359,25,530.6,13,16,4536,40,+,+++,14140,50,10382,69,+,+++,6260,73,333ms,111ms,5283ms,73791us,465ms,2535ms,21655us,154us,206us,24539us,46us,405us

###

  pool: ptest
 state: ONLINE
  scan: none requested
config:

NAMESTATE READ WRITE CKSUM
ptest   ONLINE   0 0 0
  c13t4d0   ONLINE   0 0 0
  c13t5d0   ONLINE   0 0 0

Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfsstripe   32G78  98 114243  46 72938  37   192  77 249022  44 815.1  
20
Latency   483ms 106ms5179ms3613ms 259ms1567ms
Version  1.96   --Sequential Create-- Random Create
zfsstripe   -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16  6474  53 + +++ 15505  47  8562  81 + +++ 10839  65
Latency 21894us 131us 208us   22203us  52us 230us
1.96,1.96,zfsstripe,1,1342172768,32G,,78,98,114243,46,72938,37,192,77,249022,44,815.1,20,16,6474,53,+,+++,15505,47,8562,81,+,+++,10839,65,483ms,106ms,5179ms,3613ms,259ms,1567ms,21894us,131us,208us,22203us,52us,230us



  pool: ptest
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
ptestONLINE   0 0 0
  mirror-0   ONLINE   0 0 0
c13t4d0  O

Re: [zfs-discuss] Drive upgrades

2012-04-13 Thread Michael Armstrong

Yes this Is another thing im weary of... I should have slightly under 
provisioned at the start or mixed manufacturers... Now i may have to replace 
2tb fails with 2.5 for the sake of a block

Sent from my iPhone

On 13 Apr 2012, at 17:30, Tim Cook  wrote:

> 
> 
> On Fri, Apr 13, 2012 at 9:35 AM, Edward Ned Harvey 
>  wrote:
> > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > boun...@opensolaris.org] On Behalf Of Michael Armstrong
> >
> > Is there a way to quickly ascertain if my seagate/hitachi drives are as
> large as
> > the 2.0tb samsungs? I'd like to avoid the situation of replacing all
> drives and
> > then not being able to grow the pool...
> 
> It doesn't matter.  If you have a bunch of drives that are all approx the
> same size but vary slightly, and you make (for example) a raidz out of them,
> then the raidz will only be limited by the size of the smallest one.  So you
> will only be wasting 1% of the drives that are slightly larger.
> 
> Also, given that you have a pool currently made up of 13x2T and 5x1T ... I
> presume these are separate vdev's.  You don't have one huge 18-disk raidz3,
> do you?  that would be bad.  And it would also mean that you're currently
> wasting 13x1T.  I assume the 5x1T are a single raidzN.  You can increase the
> size of these disks, without any cares about the size of the other 13.
> 
> Just make sure you have the autoexpand property set.
> 
> But most of all, make sure you do a scrub first, and make sure you complete
> the resilver in between each disk swap.  Do not pull out more than one disk
> (or whatever your redundancy level is) while it's still resilvering from the
> previously replaced disk.  If you're very thorough, you would also do a
> scrub in between each disk swap, but if it's just a bunch of home movies
> that are replaceable, you will probably skip that step.
> 
> 
> You will however have an issue replacing them if one should fail.  You need 
> to have the same block count to replace a device, which is why I asked for a 
> "right-sizing" years ago.  Deaf ears :/
> 
> --Tim
>  
>  
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Drive upgrades

2012-04-13 Thread Michael Armstrong

Hi Guys,I currently have a 18 drive system built from 13x 2.0tb Samsung's and 5x WD 1tb's... I'm about to swap out all of my 1tb drives with 2tb ones to grow the pool a bit... My question is...The replacement 2tb drives are from various manufacturers (seagate/hitachi/samsung) and I know from previous experience that the geometry/boundaries of each manufacturer's 2tb offerings are different.Is there a way to quickly ascertain if my seagate/hitachi drives are as large as the 2.0tb samsungs? I'd like to avoid the situation of replacing all drives and then not being able to grow the pool...Thanks,Michael
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] S11 vs illumos zfs compatiblity

2012-01-04 Thread Michael Sullivan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 3 Jan 12, at 04:22 , Darren J Moffat wrote:

> On 12/28/11 06:27, Richard Elling wrote:
>> On Dec 27, 2011, at 7:46 PM, Tim Cook wrote:
>>> On Tue, Dec 27, 2011 at 9:34 PM, Nico Williams  
>>> wrote:
>>> On Tue, Dec 27, 2011 at 8:44 PM, Frank Cusack  wrote:
>>>> So with a de facto fork (illumos) now in place, is it possible that two
>>>> zpools will report the same version yet be incompatible across
>>>> implementations?
>> 
>> This was already broken by Sun/Oracle when the deduplication feature was not
>> backported to Solaris 10. If you are running Solaris 10, then zpool version 
>> 29 features
>> are not implemented.
> 
> Solaris 10 does have some deduplication support, it can import and read 
> datasets in a deduped pool just fine.  You can't enable dedup on a dataset 
> and any writes won't dedup they will "rehydrate".
> 
> So it is more like partial dedup support rather than it not being there at 
> all.

"rehydrate"???


Is it instant or freeze dried?


Mike

- ---
Michael Sullivan   
m...@axsh.us
http://www.axsh.us/
Phone: +1-662-259-
Mobile: +1-662-202-7716

-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQIcBAEBAgAGBQJPAuOzAAoJEPFdIteZcPZgn7QQAI0nq500qymcpuTreoPpDHIL
vvMtRS4/VoOxmHbu2wJT9GO21f4JC3CCzFRHl8t6NkAK5vi9cuNUx1IGjDjlZAqG
Vp3H2DmtuHVHsPiAGB4J7b3zI4U8IL5tPhgbEcg5kkiTqBjMOCTdg1ibRz7ovf9Y
aDmplOD1d2UN5il6FEs3ZEojHslb4yoRajd5HgyjibF6sdC1leKcAFaUOg9q0t/s
40Ckzw6G4RC5mCb6WHK+a4WXPUMG4uPryIRl4F4jxqrMCSw/rIUHa1slVcagu1gO
wft+P7Y922SPnClMHhDufIGGKrqvJaOriYU+1ZXVoil18BaauboVn1/PEtlDOF57
vy0jOiC/DVICvk/AzzKfQxlO9YFhu4RInc27B2Ut4pCmXLeDDJpy5QXge+AZBM6K
Q2dPJJ3ZNii4JYsTfIufMzWjBwBMhUgkbbK5kbdNyuIptg/ueHOKOf+v9gSkqCGC
CjWrqtchtBSHa5Vw1JjMbKR5Y2qNzH+YuYICFgnYvJbZ31WO8TdzRL+M8PnuJRE3
WJDKs0TmSStYiuGZ1jf1oA3SJ1gcok47rYueSGKcmMSfhHfw3zeB0JpHLVQaCG2j
k2CwfwGskSs1FvgHR+YbCCne5KXwk5PzqCvd5IGH7GZyEOJLtW29MjW5d2TazSzr
3u01cKzStpyXPaxj6+cD
=SLu1
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] about btrfs and zfs

2011-11-14 Thread Michael Schuster

On Mon, Nov 14, 2011 at 14:40, Paul Kraus  wrote:
> On Fri, Nov 11, 2011 at 9:25 PM, Edward Ned Harvey
>  wrote:
>
>> LOL.  Well, for what it's worth, there are three common pronunciations for
>> btrfs.  Butterfs, Betterfs, and B-Tree FS (because it's based on b-trees.)
>> Check wikipedia.  (This isn't really true, but I like to joke, after saying
>> something like that, I wrote the wikipedia page just now.)   ;-)
>
> Is it really B-Tree based? Apple's HFS+ is B-Tree based and falls
> apart (in terms of performance) when you get too many objects in one
> FS, which is specifically what drove us to ZFS. We had 4.5 TB of data
> in about 60 million files/directories on an Apple X-Serve and X-RAID
> and the overall response was terrible. We moved the data to ZFS and
> the performance was limited by the Windows client at that point.
>
>> Speaking of which. zettabyte filesystem.   ;-)  Is it just a dumb filesystem
>> with a lot of address bits?  Or is it something that offers functionality
>> that other filesystems don't have?      ;-)
>
> The stories I have heard indicate that the name came after the TLA.
> "zfs" came first and "zettabyte" later.

as Jeff told it (IIRC), the "expanded" version of zfs underwent
several changes during the development phase, until it was decided one
day to attach none of them to "zfs" and just have it be "the last word
in filesystems". (perhaps he even replied to a similar message on this
list ... check the archives :-)

regards
-- 
Michael Schuster
http://recursiveramblings.wordpress.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Remove corrupt files from snapshot

2011-11-03 Thread Michael Schuster

Hi,

snapshots are read-only by design; you can clone them and manipulate
the clone, but the snapshot itself remains r/o.

HTH
Michael

On Thu, Nov 3, 2011 at 13:35,   wrote:
>
> Hello,
>
> I have got a bunch of corrupted files in various snapshots on my ZFS file 
> backing store. I was not able to recover them so decided to remove all, 
> otherwise the continuously make trouble for my incremental backup (rsync, 
> diff etc. fails).
>
> However, snapshots seem to be read-only:
>
> # zpool status -v
>  pool: backups
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>        corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>        entire pool from backup.
>   see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: none requested
> config:
>        NAME        STATE     READ WRITE CKSUM
>        backups     ONLINE       0     0    13
>          md0       ONLINE       0     0    13
> errors: Permanent errors have been detected in the following files:
>        /backups/memory_card/.zfs/snapshot/20110218230726/Backup/Backup.arc
> ...
>
> # rm /backups/memory_card/.zfs/snapshot/20110218230726/Backup/Backup.arc
> rm: /backups/memory_card/.zfs/snapshot/20110218230726/Backup/Backup.arc: 
> Read-only file system
>
>
> Is there any way to force the file removal?
>
>
> Cheers,
> B.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
Michael Schuster
http://recursiveramblings.wordpress.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] about btrfs and zfs

2011-10-17 Thread Michael DeMan

Or, if you absolutely must run linux for the operating system, see: 
http://zfsonlinux.org/

On Oct 17, 2011, at 8:55 AM, Freddie Cash wrote:

> If you absolutely must run Linux on your storage server, for whatever reason, 
> then you probably won't be running ZFS.  For the next year or two, it would 
> probably be safer to run software RAID (md), with LVM on top, with XFS or 
> Ext4 on top.  It's not the easiest setup to manage, but it would be safer 
> than btrfs.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] commercial zfs-based storage replication software?

2011-10-01 Thread Michael Sullivan

On 1 Oct 11, at 08:01 , Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Fajar A. Nugraha
>> 
>> On Sat, Oct 1, 2011 at 5:06 AM, Edward Ned Harvey
>>  wrote:
>>> Have you looked at Sun Unified Storage, AKA the 7000 series?
>> 
>> Thanks, that would be my fallback plan (along with nexentastor and
> netapp).
> 
> So you're basically looking for installable 3rd party software that
> replicates that functionality?  I don't know of any, but that's not saying
> much, because when it comes to ZFS, I'm not very platform explorative.

A I said before, hack an open source job scheduler or find one which allows 
creating jobs with parameters or panels for custom fields to put together the 
crontab command and wrap it in something which preserves the output of cron 
rather than email, but stores it in a database or something as well as keeps 
track of success or failure and notifies someone in the event of failure and/or 
restarts.  Which also probably means it needs to be do distributed process 
management to kickoff everything it needs to.  It should probably be ZFS aware 
so it can present filesystems and select based on filesystem rather than job.

Oracle Enterprise Manager does this.  It's commercial, and I'm sure they would 
negotiate on price for you and give you a good deal if you are good at 
bargaining with your Oracle Sales Rep.

I think his requirements are being driven by a PHB who wants to see a "GUI".

crontab, ssh - functionality already there, simple and not many "moving parts" 
but obviously too obfuscated for the PHB to understand.

Good luck.

Mike

---
Michael Sullivan   
m...@axsh.us
http://www.axsh.us/
Phone: +1-662-259-
Mobile: +1-662-202-7716

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] commercial zfs-based storage replication software?

2011-09-30 Thread Michael Sullivan

Maybe I'm missing something here, but Amanda has a whole bunch of bells and 
whistles, and scans the filesystem to determine what should be backed up.  Way 
overkill for this task I think.

Seems to me like zfs send blah | ssh replicatehost zfs receive …  more than 
meets the requirement when combined with just plain old crontab.

If it's a graphical interface you're looking for, I'm sure someone has hacked 
together somethings in TCL/Tk pr Perl/TK as an interface to cron which you 
could probably hack to have construct your particular crontab entry.

Just a thought,

Mike

---
Michael Sullivan   
m...@axsh.us
http://www.axsh.us/
Phone: +1-662-259-
Mobile: +1-662-202-7716

On 30 Sep 11, at 07:33 , Fajar A. Nugraha wrote:

> On Fri, Sep 30, 2011 at 7:22 PM, Edward Ned Harvey
>  wrote:
>>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>>> boun...@opensolaris.org] On Behalf Of Fajar A. Nugraha
>>> 
>>> Does anyone know a good commercial zfs-based storage replication
>>> software that runs on Solaris (i.e. not an appliance, not another OS
>>> based on solaris)?
>>> Kinda like Amanda, but for replication (not backup).
>> 
>> Please define replication, not backup?  To me, your question is unclear what
>> you want to accomplish.  What don't you like about zfs send | zfs receive?
> 
> Basically I need something that does zfs send | zfs receive, plus
> GUI/web interface to configure stuff (e.g. which fs to backup,
> schedule, etc.), support, and a price tag.
> 
> Believe it or not the last two requirement are actually important
> (don't ask :P ), and are the main reasons why I can't use automated
> send - receive scripts already available from the internet.
> 
> CMIIW, Amanda can use "zfs send" but it only store the resulting
> stream somewhere, while the requirement for this one is that the send
> stream must be received on a different server (e.g. DR site) and be
> accessible there.
> 
> -- 
> Fajar
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Mike

---
Michael Sullivan   
m...@axsh.us
http://www.axsh.us/
Phone: +1-662-259-
Mobile: +1-662-202-7716

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] I'm back!

2011-09-02 Thread Michael DeMan

Warm welcomes back.

So whats neXt?

- Mike DeMan


On Sep 2, 2011, at 6:30 PM, Erik Trimble wrote:

> Hi folks.
> 
> I'm now no longer at Oracle, and the past couple of weeks have been a bit of 
> a mess for me as I disentangle myself from it.
> 
> I apologize to those who may have tried to contact me during August, as my 
> @oracle.com email is no longer being read by myself, and I didn't have a lot 
> of extra time to devote to things like making sure my email subscription 
> lists pointed to my personal email. I've done that now.
> 
> I now have a free(er) hand to do some work in IllumOS (hopefully, in ZFS in 
> particular), so I'm looking forward to getting back into the swing of things. 
> And, hopefully, not be too much of a PITA.
> 
> :-)
> 
> -Erik Trimble
> tr...@netdemons.com
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Advice with SSD, ZIL and L2ARC

2011-08-29 Thread Michael DeMan

Are you truly new to ZFS?   Or do you work for NetApp or EMC or somebody else 
that is curious?

- Mike

On Aug 29, 2011, at 9:15 PM, Jesus Cea wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Hi all. Sorry if I am asking a FAQ, but I haven't found a really
> authorizative answer to this. Most references are old, incomplete or
> of "I have heard of" kind.
> 
> I am running Solaris 10 Update 9, and my pool is v22.
> 
> I recently got two 40GB SSD I plan to add to my pool. My idea is this:
> 
> 1. Format each SSD as 39GB+1GB.
> 2. Use the TWO 39GB's as L2ARC, with no redundancy.
> 3. Use the TWO 1GB's as mirrored ZIL.
> 
> 1GB of ZIL seems more than enough for my needs. I have synchronous
> writes, but they are, 99.9% of the time, <1MB/s, with occasional bursts.
> 
> My main concern here is about pool stability if there have any kind of
> problem with the SSD's. Especifically:
> 
> 1. Is the L2ARC data stored in the SSD checksummed?. If so, can I
> expect that ZFS goes directly to the disk if the checksum is wrong?.
> 
> 2. Can I import a POOL if one/both L2ARC's are not available?.
> 
> 3. What happend if a L2ARC device, suddenly, "dissappears"?.
> 
> 4. Any idea if L2ARC content will be persistent across system
> rebooting "eventually"?
> 
> 5. Can I import a POOL if one/both ZIL devices are not available?. My
> pool is v22. I know that I can remove ZIL devices since v19, but I
> don't know if I can remove them AFTER they are physically unavailable,
> of before importing the pool (after a reboot).
> 
> 6. Can I remove a ZIL device after ZFS consider it "faulty"?.
> 
> 7. What if a ZIL device "dissapears", suddenly?. I know that I could
> lose "committed" transactions in-fight, but will the machine crash?.
> Will it fallback to ZIL on harddisk?.
> 
> 8. Since my ZIL will be mirrored, I assume that the OS will actually
> will look for transactions to be replayed in both devices (AFAIK, the
> ZIL chain is considered done when the checksum of the last block is
> not valid, and I wonder how this interacts with ZIL device mirroring).
> 
> 9. If a ZIL device mirrored goes offline/online, will it resilver from
> the other side, or it will simply get new transactions, since old
> transactions are irrelevant after ¿30? seconds?.
> 
> 10. What happens if my 1GB of ZIL is too optimistic?. Will ZFS use the
> disks or it will stop writers until flushing ZIL to the HDs?.
> 
> Anything else I should consider?.
> 
> As you can see, my concerns concentrate in what happens if the SSDs go
> bad or "somebody" unplugs them "live".
> 
> I have backup of (most) of my data, but rebuilding a 12TB pool from
> backups, in a production machine, in a remote hosting, would be
> something I rather avoid :-p.
> 
> I know that hybrid HD+SSD pools were a bit flacky in the past (you
> lost the ZIL device, you kiss goodbye to your ZPOOL, in the pre-v19
> days), and I want to know what terrain I am getting into.
> 
> PS: I plan to upgrade to S10 U10 when available, and I will upgrade
> the ZPOOL version after a while.
> 
> - -- 
> Jesus Cea Avion _/_/  _/_/_/_/_/_/
> j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
> jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
> .  _/_/  _/_/_/_/  _/_/  _/_/
> "Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
> "My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
> "El amor es poner tu felicidad en la felicidad de otro" - Leibniz
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iQCVAwUBTlxjxplgi5GaxT1NAQLi9AP/VW2LQqij6y25KQ3c5EDBWvnnL1Z7R65j
> BJ0N1EbWW6ZdkQ9uFoLNJBVb8xPgwpTOKuy5g8FTwrjs1Sc5a3E3DbRDUg75faE5
> 4IOgCi0gtIVyrxGEQ2AAhnKHGcto/2gB9Y5KRiibBeysbqNvr0HXQsko7WRauP96
> N1L1TqFsN8E=
> =sDRY
> -END PGP SIGNATURE-
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!

2011-08-22 Thread Michael DeMan

I can not help but agree with Tim's comment below.

If you want a free version of ZFS, in which case you are still responsible for 
things yourself - like having backups, then maybe:

www.freenas.org
www.linuxonzfs.org
www.openindiana.org

Meanwhile, it is grossly inappropriate to be complaining about lack of support 
when using an operating system / file system that you know has no support.  
Doubly so if your data is important and doubly so again if did not already back 
it up.

- mike

On Aug 19, 2011, at 6:54 AM, Tim Cook wrote:

> 
> 
> You digitally signed a license agreement stating the following:
> No Technical Support
> Our technical support organization will not provide technical support, phone 
> support, or updates to you for the Programs licensed under this agreement.
> 
> To turn around and keep repeating that they're "holding your data hostage" is 
> disingenuous at best.  Nobody is holding your data hostage.  You voluntarily 
> put it on an operating system that explicitly states doesn't offer support 
> from the parent company.  Nobody from Oracle is going to show up with a patch 
> for you on this mailing list because none of the Oracle employees want to 
> lose their job and subsequently be subjected to a lawsuit.  If that's what 
> you're planning on waiting for, I'd suggest you take a new approach.
> 
> Sorry to be a downer, but that's reality.
> 
> --Tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Disable ZIL - persistent

2011-08-05 Thread Michael Sullivan

On 5 Aug 11, at 08:14 , Darren J Moffat wrote:

> On 08/05/11 13:11, Edward Ned Harvey wrote:
>> 
>> My question: Is there any way to make Disabled ZIL a normal mode of
>> operations in solaris 10? Particularly:
>> 
>> If I do this "echo zil_disable/W0t1 | mdb -kw" then I have to remount
>> the filesystem. It's kind of difficult to do this automatically at boot
>> time, and impossible (as far as I know) for rpool. The only solution I
>> see is to write some startup script which applies it to filesystems
>> other than rpool. Which feels kludgy. Is there a better way?
> 
> echo "set zfs:zil_disable = 1" > /etc/system

echo "set zfs:zil_disable = 1" >> /etc/system

Mike

---
Michael Sullivan   
m...@axsh.us
http://www.axsh.us/
Phone: +1-662-259-
Mobile: +1-662-202-7716


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Zil on multiple usb keys

2011-07-22 Thread Michael DeMan

+1 on the below, and in addition...

...compact flash, like off of USB sticks is not designed to deal with very many 
writes to it.  Commonly it is used to store a bootable image that maybe once a 
year will have an upgrade on it.

Basically, trying to use those devices for a ZIL, even they are mirrored - you 
should be prepared to having one die and be replaced very, very regularly.

Generally performance is going to pretty bad as well - USB sticks are not made 
to be written too rapidly.  They are entirely different animals than SSDs.  I 
would not be surprised (but would be curious to know if you still move forward 
on this) that you will find performance even worse trying to do this.

On Jul 18, 2011, at 1:54 AM, Fajar A. Nugraha wrote:

> First of all, using USB disks for permanent storage is a bad idea. Go
> for e-sata instead (http://en.wikipedia.org/wiki/Serial_ata#eSATA). It

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Resizing ZFS partition, shrinking NTFS?

2011-06-17 Thread Michael Sullivan

On 17 Jun 11, at 21:14 , Bob Friesenhahn wrote:

> On Fri, 17 Jun 2011, Jim Klimov wrote:
>> I gather that he is trying to expand his root pool, and you can
>> not add a vdev to one. Though, true, it might be possible to
>> create a second, data pool, in the partition. I am not sure if
>> zfs can make two pools in different partitions of the same
>> device though - underneath it still uses Solaris slices, and
>> I think those can be used on one partition. That was my
>> assumption for a long time, though never really tested.
> 
> This would be a bad assumption.  Zfs should not care and you are able to do 
> apparently silly things with it.  Sometimes allowing potentially silly things 
> is quite useful.
> 

This is true.  If one has mirrored disks, you could do something like I explain 
here WRT partitioning and resizing pools.

http://www.kamiogi.net/Kamiogi/Frame_Dragging/Entries/2009/5/19_Everything_in_Its_Place_-_Moving_and_Reorganizing_ZFS_Storage.html

I did some shuffling using Solaris partitions here on a home server, but it was 
using mirrors of the same geometry disks.

You might be able to o a similar shuffle using an external USB drive which was 
appropriately sized and turn on autoexpand.

Mike

---
Michael Sullivan   
m...@axsh.us
http://www.axsh.us/
Phone: +1-662-259-
Mobile: +1-662-202-7716

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] question about COW and snapshots

2011-06-17 Thread Michael Sullivan

On 17 Jun 11, at 21:02 , Ross Walker wrote:

> On Jun 16, 2011, at 7:23 PM, Erik Trimble  wrote:
> 
>> On 6/16/2011 1:32 PM, Paul Kraus wrote:
>>> On Thu, Jun 16, 2011 at 4:20 PM, Richard Elling
>>>   wrote:
>>> 
>>>> You can run OpenVMS :-)
>>> Since *you* brought it up (I was not going to :-), how does VMS'
>>> versioning FS handle those issues ?
>>> 
>> It doesn't, per se.  VMS's filesystem has a "versioning" concept (i.e. every 
>> time you do a close() on a file, it creates a new file with the version 
>> number appended, e.g.  foo;1  and foo;2  are the same file, different 
>> versions).  However, it is completely missing the rest of the features we're 
>> talking about, like data *consistency* in that file. It's still up to the 
>> app using the file to figure out what data consistency means, and such.  
>> Really, all VMS adds is versioning, nothing else (no API, no additional 
>> features, etc.).
> 
> I believe NTFS was built on the same concept of file streams the VMS FS used 
> for versioning.
> 
> It's a very simple versioning system.
> 
> Personnally I use Sharepoint, but there are other content management systems 
> out there that provide what your looking for, so no need to bring out the 
> crypt keeper.
> 

I think from following this whole discussion people are wanting "Versions" 
which will be offered by OS X Lion soon. However, it is dependent upon 
applications playing nice,behaving and using the "standard" API's.

It would likely take a major overhaul in the way ZFS handles snapshots to 
create them at the object level rather than the filesystems level.  Might be a 
nice exploratory exercise for those in the know with the ZFS roadmap, but then 
there are two "roadmaps" right?

Also consistency and integrity cannot be guaranteed on the object level since 
an application may have more than a single filesystem object in use at a time 
and operations would need to be transaction based with commits and rollbacks.

Way off-topic, but Smalltalk and its variants do this by maintaining the state 
of everything in an operating environment image.

But then again, I could be wrong.

Mike

---
Michael Sullivan   
m...@axsh.us
http://www.axsh.us/
Phone: +1-662-259-
Mobile: +1-662-202-7716

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Resizing ZFS partition, shrinking NTFS?

2011-06-16 Thread Michael Schuster


On 17.06.2011 01:44, John D Groenveld wrote:

In message<444915109.61308252125289.JavaMail.Twebapp@sf-app1>, Clive Meredith
writes:

I currently run a duel boot machine with a 45Gb partition for Win7 Ultimate an
d a 25Gb partition for OpenSolaris 10 (134).  I need to shrink NTFS to 20Gb an
d increase the ZFS partion to 45Gb.  Is this possible please?  I have looked a
t using the partition tool in OpenSolaris but both partition are locked, even
under admin.  Win7 won't allow me to shrink the dynamic volume, as the Finsh b
utton is always greyed out, so no luck in that direction.


Shrink the NTFS filesystem first.
I've used the Knoppix LiveCD against a defragmented NTFS.

Then use beadm(1M) to duplicate your OpenSolaris BE to
a USB drive and also send snapshots of any other rpool ZFS
there.


I'd suggest a somewhat different approach:
1) boot a live cd and use something like parted to shrink the NTFS partition
2) create a new partition without FS in the space now freed from NTFS
3) boot OpenSolaris, add the partition from 2) as vdev to your zpool.

HTH
Michael
--
Michael Schuster
http://recursiveramblings.wordpress.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] question about COW and snapshots

2011-06-15 Thread Michael Schuster


On 15.06.2011 14:30, Simon Walter wrote:


Another one is that snapshots are per-filesystem, while the intention
here is to capture a document in one user session. Taking a snapshot
will of course say nothing about the state of other user sessions. Any
document in the process of being saved by another user, for example,
will be corrupt.


Would it be? I think that's pretty lame for ZFS to corrupt data.


I think "corrupt" is not the right word to use here - "inconsistent" is 
probably better. ZFS has no idea when a document is "OK", so if your 
snapshot happens between two writes (even from a single user), it will 
be consistent from the POV of the FS, but may not be from the POV of the 
application.


HTH
Michael
--
Michael Schuster
http://recursiveramblings.wordpress.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Have my RMA... Now what??

2011-05-28 Thread Michael DeMan

Yes, particularly if you have older drives with 512 sectors and then buy a 
newer drive that seems the same, but is not, because it has 4k sectors.  Looks 
like it works, and will work, but performance drops.


On May 28, 2011, at 4:59 PM, Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. wrote:

> yes good idea, another things to keep in mind
> technology change so fast, by the time you want a replacement, may be HDD 
> does exist any more
> or the supplier changed, so the drives are not exactly like your original 
> drive
> 
> 
> 
> 
> On 5/28/2011 6:05 PM, Michael DeMan wrote:
>> Always pre-purchase one extra drive to have on hand.  When you get it, 
>> confirm it was not dead-on-arrival by hooking up on an external USB to a 
>> workstation and running whatever your favorite tools are to validate it is 
>> okay.  Then put it back in its original packaging, and put a label on it 
>> about what it is, and that it is a spare for box(s) XYZ disk system.
>> 
>> When a drive fails, use that one off the shelf to do your replacement 
>> immediately then deal with the RMA, paperwork, and snailmail to get the bad 
>> drive replaced.
>> 
>> Also, depending how many disks you have in your array - keeping multiple 
>> spares can be a good idea as well to cover another disk dying while waiting 
>> on that replacement one.
>> 
>> In my opinion, the above goes whether you have your disk system configured 
>> with hot spare or not.  And the technique is applicable to both 
>> personal/home-use and commercial uses if your data is important.
>> 
>> 
>> - Mike
>> 
>> On May 28, 2011, at 9:30 AM, Brian wrote:
>> 
>>> I have a raidz2 pool with one disk that seems to be going bad, several 
>>> errors are noted in iostat.  I have an RMA for the drive, however - no I am 
>>> wondering how I proceed.  I need to send the drive in and then they will 
>>> send me one back.  If I had the drive on hand, I could do a zpool replace.
>>> 
>>> Do I do a zpool offline? zpool detach?
>>> Once I get the drive back and put it in the same drive bay..  Is it just a 
>>> zpool replace?
>>> -- 
>>> This message posted from opensolaris.org
>>> ___
>>> zfs-discuss mailing list
>>> zfs-discuss@opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Have my RMA... Now what??

2011-05-28 Thread Michael DeMan

Always pre-purchase one extra drive to have on hand.  When you get it, confirm 
it was not dead-on-arrival by hooking up on an external USB to a workstation 
and running whatever your favorite tools are to validate it is okay.  Then put 
it back in its original packaging, and put a label on it about what it is, and 
that it is a spare for box(s) XYZ disk system.

When a drive fails, use that one off the shelf to do your replacement 
immediately then deal with the RMA, paperwork, and snailmail to get the bad 
drive replaced.

Also, depending how many disks you have in your array - keeping multiple spares 
can be a good idea as well to cover another disk dying while waiting on that 
replacement one.

In my opinion, the above goes whether you have your disk system configured with 
hot spare or not.  And the technique is applicable to both personal/home-use 
and commercial uses if your data is important.

- Mike

On May 28, 2011, at 9:30 AM, Brian wrote:

> I have a raidz2 pool with one disk that seems to be going bad, several errors 
> are noted in iostat.  I have an RMA for the drive, however - no I am 
> wondering how I proceed.  I need to send the drive in and then they will send 
> me one back.  If I had the drive on hand, I could do a zpool replace.  
> 
> Do I do a zpool offline? zpool detach? 
> Once I get the drive back and put it in the same drive bay..  Is it just a 
> zpool replace ?
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] best migration path from Solaris 10

2011-03-23 Thread Michael DeMan

I think on this, the big question is going to be whether Oracle continues to 
release ZFS updates under CDDL after their commercial releases.

Overall, in the past it has obviously and necessarily been the case that 
FreeBSD has been a '2nd class citizen'.

Moving forward, that 2nd class idea becomes very mutable - and ironically it 
becomes more so in regards to dealing with organizations that have longevity.

Moving forward...

If Oracle continues to release critical ZFS feature sets under CDDL to the 
community, then:

A) They are no longer pre-releasing those features to OpenSolaris
B) FreeBSD gets them at the same time.

If Oracle does not continue to release ZFS features sets under CDDL, then then 
game changes.  Pick your choice of operating systems - one that has a history 
of surviving for nearly two decades on its own with community support, or the 
'green leaf off the dead tree' that just decided to jump into the willy-nilly 
world without direct/giant corporate support.

2nd class citizen issue for FreeBSD disappears either way.  

The only remaining question would be the remaining crufts of legal disposition. 
 I could for instance see NetApp or somebody try and sue ixSystems, but I have 
a really, really rough time seeing Oracle/LarryEllison suing the FreeBSD 
foundation overall or something?

Oh yeah - plus BTRFS on the horizon?

Honestly - I am not here to start a flame war - I am asking these questions 
because businesses both big and small need to know what to do.

My hunch is, we all have to wait and see if Oracle releases ZFS updates after 
Solaris 11, and if so, whether that is a subset of functionality or full 
functionality. 

- mike

On Mar 19, 2011, at 11:54 PM, Fajar A. Nugraha wrote:

> On Sun, Mar 20, 2011 at 4:05 AM, Pawel Jakub Dawidek  wrote:
>> On Fri, Mar 18, 2011 at 06:22:01PM -0700, Garrett D'Amore wrote:
>>> Newer versions of FreeBSD have newer ZFS code.
>> 
>> Yes, we are at v28 at this point (the lastest open-source version).
>> 
>>> That said, ZFS on FreeBSD is kind of a 2nd class citizen still. [...]
>> 
>> That's actually not true. There are more FreeBSD committers working on
>> ZFS than on UFS.
> 
> How is the performance of ZFS under FreeBSD? Is it comparable to that
> in Solaris, or still slower due to some needed compatibility layer?
> 
> -- 
> Fajar
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] best migration path from Solaris 10

2011-03-18 Thread Michael DeMan

ZFSv28 is in HEAD now and will be out in 8.3.

ZFS + HAST in 9.x means being able to cluster off different hardware.

In regards to OpenSolaris and Indiana - can somebody clarify the relationship 
there?  It was clear with OpenSolaris that the latest/greatest ZFS would always 
be available since it was a guinea-pig product for cost conscious folks and 
served as an excellent area for Sun to get marketplace feedback and bug fixes 
done before rolling updates into full Solaris.

To me it seems that Open Indiana is basically a green branch off of a dead tree 
- if I am wrong, please enlighten me.

On Mar 18, 2011, at 6:16 PM, Roy Sigurd Karlsbakk wrote:

>> I think we all feel the same pain with Oracle's purchase of Sun.
>> 
>> FreeBSD that has commercial support for ZFS maybe?
> 
> Fbsd currently has a very old zpool version, not suitable for running with 
> SLOGs, since if you lose it, you may lose the pool, which isn't very 
> amusing...
> 
> Vennlige hilsener / Best regards
> 
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 97542685
> r...@karlsbakk.net
> http://blogg.karlsbakk.net/
> --
> I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
> er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
> idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
> relevante synonymer på norsk.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [OpenIndiana-discuss] best migration path from Solaris 10

2011-03-18 Thread Michael DeMan

Hi David,

Caught your note about bonnie, actually do some testing myself over the weekend.

All on older hardware for fun - dual opteron 285 with 16GB RAM.  Disk systems 
is off a pair of SuperMicro SATA cards, with a combination of WD enterprise and 
Seagate ES 1TB drives.  No ZIL, no L2ARC, no tuning at all from base FreeNAS 
install.

10 drives total, I'm going to be running tests as below, mostly curious about 
IOPS and to sort out a little debate with a co-worker.

- all 10 in one raidz2 (running now)
- 5 by 2-way mirrors
- 2 by 5-disk raidz1

Script is as below - if folks would find the data I collect be useful 
information at all, let me know and I will post it publicly somewhere.

freenas# cat test.sh
#!/bin/sh

# Basic test for file I/O.  We run lots and lots of the tradditional
# 'bonnie' tool at 50GB file size, starting one every minute.  Resulting
# data should give us a good work mixture in the middle given all the different
# tests that bonnnie runs, 100 instances running at the same time, and at 
different
# stages of their processing.

MAX=100
COUNT=0

FILESYSTEM=testrz2
LOG=${FILESYSTEM}.log

date > ${LOG}
echo "Test with file system named ${FILESYSTEM} and Configuration of..." >> 
${LOG}
zpool status >> ${LOG}

# DEMAN grab zfs and regular dev iostats every 10 minutes during test
zpool iostat -v 600 >>  ${LOG} &
iostat -w 600 ada0 ada1 ada2 ada3 ada4 ada5 ada6 ada7 ada8 ada9 > ${LOG}.iostat 
& 

while [ $COUNT -le $MAX ]; do
echo kicking off bonnie
bonnie -d /mnt/${FILESYSTEM} -s 5 &
sleep 60;
COUNT=$((count+1));
done;

On Mar 18, 2011, at 3:26 PM, David Brodbeck wrote:

> I'm in a similar position, so I'll be curious what kinds of responses you 
> get.  I can give you a thumbnail sketch of what I've looked at so far:
> 
> I evaluated FreeBSD, and ruled it out because I need NFSv4, and FreeBSD's 
> NFSv4 support is still in an early stage.  The NFS stability and performance 
> just isn't there yet, in my opinion.
> 
> Nexenta Core looked promising, but locked up in bonnie++ NFS testing with our 
> RedHat nodes, so its stability is a bit of a question mark for me.
> 
> I haven't gotten the opportunity to thoroughly evaluate OpenIndiana, yet.  
> It's only available as a DVD ISO, and my test machine currently has only a 
> CD-ROM drive.  Changing that is on my to-do list, but other things keep 
> slipping in ahead of it.
> 
> For now I'm running OpenSolaris, with a locally-compiled version of Samba.  
> (The OpenSolaris Samba package is very old and has several unpatched security 
> holes, at this point.)
> 
> -- 
> David Brodbeck
> System Administrator, Linguistics
> University of Washington
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] best migration path from Solaris 10

2011-03-18 Thread Michael DeMan

I think we all feel the same pain with Oracle's purchase of Sun.

FreeBSD that has commercial support for ZFS maybe?

Not here quite yet, but it is something being looked at by an F500 that I am 
currently on contract with.

www.freenas.org, www.ixsystems.com.

Not saying this would be the right solution by any means, but for that 
'corporate barrier', sometimes the option to get both the hardware and ZFS from 
the same place, with support, helps out.

- mike


On Mar 18, 2011, at 2:56 PM, Paul B. Henson wrote:

> We've been running Solaris 10 for the past couple of years, primarily to 
> leverage zfs to provide storage for about 40,000 faculty, staff, and students 
> as well as about 1000 groups. Access is provided via NFSv4, CIFS (by samba), 
> and http/https (including a local module allowing filesystem acl's to be 
> respected via web access). This has worked reasonably well barring some 
> ongoing issues with scalability (approximately a 2 hour reboot window on an 
> x4500 with ~8000 zfs filesystems, complete breakage of live upgrade) and 
> acl/chmod interaction madness.
> 
> We were just about to start working on a cutover to OpenSolaris (for the 
> in-kernel CIFS server, and quicker access to new features/developments) when 
> Oracle finished assimilating Sun and killed off the OpenSolaris distribution. 
> We've been sitting pat for a while to see how things ended up shaking out, 
> and at this point want to start reevaluating our best migration option to 
> move forward from Solaris 10.
> 
> There's really nothing else available that is comparable to zfs (perhaps 
> btrfs someday in the indefinite future, but who knows when that day might 
> come), so our options would appear to be Solaris 11 Express, Nexenta (either 
> NexentaStor or NexentaCore), and OpenIndiana (FreeBSD is occasionally 
> mentioned as a possibility, but I don't really see that as suitable for our 
> enterprise needs).
> 
> Solaris 11 is the official successor to OpenSolaris, has commercial support, 
> and the backing of a huge corporation which historically has contributed the 
> majority of Solaris forward development. However, that corporation is Oracle, 
> and frankly, I don't like doing business with Oracle. With no offense 
> intended to the no doubt numerous talented and goodhearted people that might 
> work there, Oracle is simply evil. We've dealt with Oracle for a long time 
> (in addition to their database itself, we're a PeopleSoft shop) and a 
> positive interaction with them is quite rare. Since they took over Sun, costs 
> on licensing, support contracts, and hardware have increased dramatically, at 
> least in the cases where we've actually been able to get a quote. Arguably, 
> we are not their target market, and they make that quite clear ;). There's 
> also been significant brain drain of prior Sun employees since the takeover, 
> so while they might still continue to contribute the most money into Solaris 
> dev
 elopment, they might not be the future source of the most innovation. Given 
our needs, and our budget, I really don't consider this a viable option.
> 
> Nexenta, on the other hand, seems to be the kind of company I'd like to deal 
> with. Relatively small, nimble, with a ton of former Sun zfs talent working 
> for them, and what appears to be actual consideration for the needs of their 
> customers. I think I'd more likely get my needs addressed through Nexenta, 
> they've already started work on adding aclmode back and I've had some initial 
> discussion with one of their engineers on the possibility of adding 
> additional options such as denying or ignoring attempted chmod updates on 
> objects with acls. It looks like they only offer commercial support for 
> NexentaStor, not NexentaCore. Commercial support isn't a strict requirement, 
> a sizable chunk of our infrastructure runs on a non-commercial linux 
> distribution and open source software, but it can make management happier. 
> NexentaStor seems positioned as a storage appliance, which isn't really what 
> we need. I'm not particularly interested in a web gui or cli interface that 
> hides the underly
 ing complexity of the operating system and zfs, on the contrary, I want full 
access to the guts :). We have our zfs deployment integrated into our identity 
management system, which automatically provisions, destroys, and maintains 
filespace for our user/groups, as well as providing an API for end-users and 
administrators to manage quotas and other attributes. We also run apache with 
some custom modules. I still need to investigate further, but I'm not even sure 
if NexentaStor provides access into the underlying OS or encapsulates 
everything and only allows control through its own administrative functionality.
> 
> NexentaCore is more of the raw operating system we're probably looking for, 
> but with only community-based support. Given that NexentaCore and OpenIndiana 
> are now both going to be based off of the illumos core, I'm no

Re: [zfs-discuss] zfs-discuss Digest, Vol 64, Issue 21

2011-02-07 Thread Michael Armstrong

I obtained smartmontools (which includes smartctl) from the standard apt 
repository (i'm using nexenta however), in addition its neccessary to use the 
device type of sat,12 with smartctl to get it to read attributes correctly in 
OS afaik. Also regarding dev id's on the system, from what i've seen they are 
assigned to ports therefor do not change, however upon changing a controller 
will most likely change unless its the same chipset with exactly the same port 
configuration. Hope this helps.

On 7 Feb 2011, at 18:04, zfs-discuss-requ...@opensolaris.org wrote:

> Having managed to muddle through this weekend without loss (though with a
> certain amount of angst and duplication of efforts), I'm in the mood to
> label things a bit more clearly on my system :-).
> 
> smartctl doesn't seem to be on my system, though.  I'm running
> snv_134.  I'm still pretty badly lost in the whole repository /
> package thing with Solaris, most of my brain cells were already
> occupied with Red Hat, Debian, and Perl package information :-( .
> Where do I look?
> 
> Are the controller port IDs, the "C9T3D0" things that ZFS likes,
> reasonably stable?  They won't change just because I add or remove
> drives, right; only maybe if I change controller cards?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] deduplication requirements

2011-02-07 Thread Michael

Hi guys,

I'm currently running 2 zpools each in a raidz1 configuration, totally
around 16TB usable data. I'm running it all on an OpenSolaris based box with
2gb memory and an old Athlon 64 3700 CPU, I understand this is very poor and
underpowered for deduplication, so I'm looking at building a new system, but
wanted some advice first, here is what i've planned so far:

Core i7 2600 CPU
16gb DDR3 Memory
64GB SSD for ZIL (optional)

Would this produce decent results for deduplication of 16TB worth of pools
or would I need more RAM still?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-discuss Digest, Vol 64, Issue 13

2011-02-06 Thread Michael Armstrong

Additionally, the way I do it is to draw a diagram of the drives in the system, 
labelled with the drive serial numbers. Then when a drive fails, I can find out 
from smartctl which drive it is and remove/replace without trial and error.

On 5 Feb 2011, at 21:54, zfs-discuss-requ...@opensolaris.org wrote:

> 
> Message: 7
> Date: Sat, 5 Feb 2011 15:42:45 -0500
> From: rwali...@washdcmail.com
> To: David Dyer-Bennet 
> Cc: zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] Identifying drives (SATA)
> Message-ID: <58b53790-323b-4ae4-98cd-575f93b66...@washdcmail.com>
> Content-Type: text/plain; charset=us-ascii
> 
> 
> On Feb 5, 2011, at 2:43 PM, David Dyer-Bennet wrote:
> 
>> Is there a clever way to figure out which drive is which?  And if I have to 
>> fall back on removing a drive I think is right, and seeing if that's true, 
>> what admin actions will I have to perform to get the pool back to safety?  
>> (I've got backups, but it's a pain to restore of course.) (Hmmm; in 
>> single-user mode, use dd to read huge chunks of one disk, and see which 
>> lights come on?  Do I even need to be in single-user mode to do that?)
> 
> Obviously this depends on your lights working to some extent (the right light 
> doing something when the right disk is accessed), but I've used:
> 
> dd if=/dev/rdsk/c8t3d0s0 of=/dev/null bs=4k count=10
> 
> which someone mentioned on this list.  Assuming you can actually read from 
> the disk (it isn't completely dead), it should allow you to direct traffic to 
> each drive individually.
> 
> Good luck,
> Ware

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] NFS slow for small files: idle disks

2011-01-20 Thread Michael Hase

sks each, and 1 
channel with 2 disks).

Richard Ellings zilstat gives

   N-Bytes  N-Bytes/s N-Max-RateB-Bytes  B-Bytes/s B-Max-Rateops  <=4kB 
4-32kB >=32kB
  9552   9552   9552 671744 671744 671744164164 
 0  0
 10192  10192  10192 724992 724992 724992177177 
 0  0
  9568   9568   9568 679936 679936 679936166166 
 0  0
 11712  11712  11712 823296 823296 823296201201 
 0  0
 10784  10784  10784 765952 765952 765952187187 
 0  0
 10024  10024  10024 708608 708608 708608173173 
 0  0

About 200 zil ops all < 4k as maximum. As said the disks aren't busy during 
this test.

The test zfs ist configured with atime off. logbias nearly doesn't matter, with 
logbias=latency the iops rate is a little bit lower.

Attached are some bonnie++ results to show, that all disks and the whole pool 
are quite healthy. I get > 1000 random reads/sec local and still nearly 900 
reads/sec via nfs. For large files I easily get gbit wirespeed (105 MB/sec 
read) with nfs. And for random reads in a bonnie or iozone test the disks are 
really 80%-100% busy. Just for small files the array sits almost idle, the 
array can do way more. I discovered this on different solaris versions, not 
only this test system. Is there any explanation for this behaviour?

Thanks,
Michael
-- 
This message posted from opensolaris.orglocal

Version 1.03c   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
ibmr10  16G   108972  25 89923  21   263540  26  1074   
3
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16 30359  99 + +++ + +++ 24836  99 + +++ + +++
ibmr10,16G,,,108972,25,89923,21,,,263540,26,1073.5,3,16,30359,99,+,+++,+,+++,24836,99,+,+++,+,+++
NFS

Version 1.03d   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
nfsibmr10   16G   50022  11 42524  14   105335  18 884.8  20
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16   152   3 + +++   182   1   151   3 + +++   183   1
nfsibmr10,16G,,,50022,11,42524,14,,,105335,18,884.8,20,16,152,3,+,+++,182,1,151,3,+,+++,183,1
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Troubleshooting help on ZFS

2011-01-20 Thread Michael Schuster

On Thu, Jan 20, 2011 at 01:47, Steve Kellam
 wrote:
> I have a home media server set up using OpenSolaris.   All my experience with 
> OpenSolaris has been through setting up and maintaining this server so it is 
> rather limited.   I have run in to some problems recently and I am not sure 
> how the best way to troubleshoot this.  I was hoping to get some feedback on 
> possible fixes for this.
>
> I am running SunOS 5.11 snv_134.  It is running on a tower with 6 HDD 
> configured in as raidz2 array.  Motherboard: ECS 945GCD-M(1.0) Intel Atom 330 
> Intel 945GC Micro ATX Motherboard/CPU Combo.  Memory: 4GB.
>
> I set this up about a year ago and have had very few problems.  I was 
> streaming a movie off the server a few days ago and it all of a sudden lost 
> connectivity with the server.  When I checked the server, there was no output 
> on the display from the server but the power supply seemed to be running and 
> the fans were going.
> The next day it started working again and I was able to log in.  The SMB and 
> NFS file server was connecting without problems.
>
> Now I am able to connect remotely via SSH.  I am able to bring up a zpool 
> status screen that shows no problems.  It reports no known data errors.  I am 
> able to go to the top level data directories but when I cd into the 
> sub-directories the SSH connection freezes.
>
> I have tried to do a ZFS scrub on the pool and it only gets to 0.02% and 
> never gets beyond that but does not report any errors.  Now, also, I am 
> unable to stop the scrub.  I use the zpool scrub -s command but this freezes 
> the SSH connection.
> When I reboot, it is still trying to scrub but not making progress.
>
> I have the system set up to a battery back up with surge protection and I'm 
> not aware of any spikes in electricity recently.  I have not made any 
> modifications to the system.  All the drives have been run through SpinRite 
> less than a couple months ago without any data errors.
>
> I can't figure out how this happened all of the sudden and how best to 
> troubleshoot it.
>
> If you have any help or technical wisdom to offer, I'd appreciate it as this 
> has been frustrating.

look in /var/adm/messages (.*) to see whether there's anything
interesting around the time you saw the loss of connectivity, and also
since, then take it from there.

HTH
Michael
-- 
regards/mit freundlichen Grüssen
Michael Schuster
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Is my bottleneck RAM?

Ah ok, I wont be using dedup anyway just wanted to try. Ill be adding more ram 
though, I guess you can't have too much. Thanks

Erik Trimble  wrote:

>You can't really do that.
>
>Adding an SSD for L2ARC will help a bit, but L2ARC storage also consumes
>RAM to maintain a cache table of what's in the L2ARC.  Using 2GB of RAM
>with an SSD-based L2ARC (even without Dedup) likely won't help you too
>much vs not having the SSD. 
>
>If you're going to turn on Dedup, you need at least 8GB of RAM to go
>with the SSD.
>
>-Erik
>
>
>On Tue, 2011-01-18 at 18:35 +, Michael Armstrong wrote:
>> Thanks everyone, I think overtime I'm gonna update the system to include an 
>> ssd for sure. Memory may come later though. Thanks for everyone's responses
>> 
>> Erik Trimble  wrote:
>> 
>> >On Tue, 2011-01-18 at 15:11 +, Michael Armstrong wrote:
>> >> I've since turned off dedup, added another 3 drives and results have 
>> >> improved to around 148388K/sec on average, would turning on compression 
>> >> make things more CPU bound and improve performance further?
>> >> 
>> >> On 18 Jan 2011, at 15:07, Richard Elling wrote:
>> >> 
>> >> > On Jan 15, 2011, at 4:21 PM, Michael Armstrong wrote:
>> >> > 
>> >> >> Hi guys, sorry in advance if this is somewhat a lowly question, I've 
>> >> >> recently built a zfs test box based on nexentastor with 4x samsung 2tb 
>> >> >> drives connected via SATA-II in a raidz1 configuration with dedup 
>> >> >> enabled compression off and pool version 23. From running bonnie++ I 
>> >> >> get the following results:
>> >> >> 
>> >> >> Version 1.03b   --Sequential Output-- --Sequential Input- 
>> >> >> --Random-
>> >> >>   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
>> >> >> --Seeks--
>> >> >> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
>> >> >> /sec %CP
>> >> >> nexentastor  4G 60582  54 20502   4 12385   3 53901  57 105290  10 
>> >> >> 429.8   1
>> >> >>   --Sequential Create-- Random 
>> >> >> Create
>> >> >>   -Create-- --Read--- -Delete-- -Create-- --Read--- 
>> >> >> -Delete--
>> >> >> files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
>> >> >> /sec %CP
>> >> >>16  7181  29 + +++ + +++ 21477  97 + +++ 
>> >> >> + +++
>> >> >> nexentastor,4G,60582,54,20502,4,12385,3,53901,57,105290,10,429.8,1,16,7181,29,+,+++,+,+++,21477,97,+,+++,+,+++
>> >> >> 
>> >> >> 
>> >> >> I'd expect more than 105290K/s on a sequential read as a peak for a 
>> >> >> single drive, let alone a striped set. The system has a relatively 
>> >> >> decent CPU, however only 2GB memory, do you think increasing this to 
>> >> >> 4GB would noticeably affect performance of my zpool? The memory is 
>> >> >> only DDR1.
>> >> > 
>> >> > 2GB or 4GB of RAM + dedup is a recipe for pain. Do yourself a favor, 
>> >> > turn off dedup
>> >> > and enable compression.
>> >> > -- richard
>> >> > 
>> >> 
>> >> ___
>> >> zfs-discuss mailing list
>> >> zfs-discuss@opensolaris.org
>> >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>> >
>> >
>> >Compression will help speed things up (I/O, that is), presuming that
>> >you're not already CPU-bound, which it doesn't seem you are.
>> >
>> >If you want Dedup, you pretty much are required to buy an SSD for L2ARC,
>> >*and* get more RAM.
>> >
>> >
>> >These days, I really don't recommend running ZFS as a fileserver without
>> >a bare minimum of 4GB of RAM (8GB for anything other than light use),
>> >even with Dedup turned off. 
>> >
>> >
>> >-- 
>> >Erik Trimble
>> >Java System Support
>> >Mailstop:  usca22-317
>> >Phone:  x67195
>> >Santa Clara, CA
>> >Timezone: US/Pacific (GMT-0800)
>> >
>-- 
>Erik Trimble
>Java System Support
>Mailstop:  usca22-317
>Phone:  x67195
>Santa Clara, CA
>Timezone: US/Pacific (GMT-0800)
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Is my bottleneck RAM?

Thanks everyone, I think overtime I'm gonna update the system to include an ssd 
for sure. Memory may come later though. Thanks for everyone's responses

Erik Trimble  wrote:

>On Tue, 2011-01-18 at 15:11 +, Michael Armstrong wrote:
>> I've since turned off dedup, added another 3 drives and results have 
>> improved to around 148388K/sec on average, would turning on compression make 
>> things more CPU bound and improve performance further?
>> 
>> On 18 Jan 2011, at 15:07, Richard Elling wrote:
>> 
>> > On Jan 15, 2011, at 4:21 PM, Michael Armstrong wrote:
>> > 
>> >> Hi guys, sorry in advance if this is somewhat a lowly question, I've 
>> >> recently built a zfs test box based on nexentastor with 4x samsung 2tb 
>> >> drives connected via SATA-II in a raidz1 configuration with dedup enabled 
>> >> compression off and pool version 23. From running bonnie++ I get the 
>> >> following results:
>> >> 
>> >> Version 1.03b   --Sequential Output-- --Sequential Input- 
>> >> --Random-
>> >>   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
>> >> --Seeks--
>> >> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
>> >> /sec %CP
>> >> nexentastor  4G 60582  54 20502   4 12385   3 53901  57 105290  10 
>> >> 429.8   1
>> >>   --Sequential Create-- Random 
>> >> Create
>> >>   -Create-- --Read--- -Delete-- -Create-- --Read--- 
>> >> -Delete--
>> >> files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
>> >> %CP
>> >>16  7181  29 + +++ + +++ 21477  97 + +++ + 
>> >> +++
>> >> nexentastor,4G,60582,54,20502,4,12385,3,53901,57,105290,10,429.8,1,16,7181,29,+,+++,+,+++,21477,97,+,+++,+,+++
>> >> 
>> >> 
>> >> I'd expect more than 105290K/s on a sequential read as a peak for a 
>> >> single drive, let alone a striped set. The system has a relatively decent 
>> >> CPU, however only 2GB memory, do you think increasing this to 4GB would 
>> >> noticeably affect performance of my zpool? The memory is only DDR1.
>> > 
>> > 2GB or 4GB of RAM + dedup is a recipe for pain. Do yourself a favor, turn 
>> > off dedup
>> > and enable compression.
>> > -- richard
>> > 
>> 
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
>Compression will help speed things up (I/O, that is), presuming that
>you're not already CPU-bound, which it doesn't seem you are.
>
>If you want Dedup, you pretty much are required to buy an SSD for L2ARC,
>*and* get more RAM.
>
>
>These days, I really don't recommend running ZFS as a fileserver without
>a bare minimum of 4GB of RAM (8GB for anything other than light use),
>even with Dedup turned off. 
>
>
>-- 
>Erik Trimble
>Java System Support
>Mailstop:  usca22-317
>Phone:  x67195
>Santa Clara, CA
>Timezone: US/Pacific (GMT-0800)
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Is my bottleneck RAM?

I've since turned off dedup, added another 3 drives and results have improved 
to around 148388K/sec on average, would turning on compression make things more 
CPU bound and improve performance further?

On 18 Jan 2011, at 15:07, Richard Elling wrote:

> On Jan 15, 2011, at 4:21 PM, Michael Armstrong wrote:
> 
>> Hi guys, sorry in advance if this is somewhat a lowly question, I've 
>> recently built a zfs test box based on nexentastor with 4x samsung 2tb 
>> drives connected via SATA-II in a raidz1 configuration with dedup enabled 
>> compression off and pool version 23. From running bonnie++ I get the 
>> following results:
>> 
>> Version 1.03b   --Sequential Output-- --Sequential Input- 
>> --Random-
>>   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
>> %CP
>> nexentastor  4G 60582  54 20502   4 12385   3 53901  57 105290  10 429.8 
>>   1
>>   --Sequential Create-- Random Create
>>   -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>> files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>>16  7181  29 + +++ + +++ 21477  97 + +++ + +++
>> nexentastor,4G,60582,54,20502,4,12385,3,53901,57,105290,10,429.8,1,16,7181,29,+,+++,+,+++,21477,97,+,+++,+,+++
>> 
>> 
>> I'd expect more than 105290K/s on a sequential read as a peak for a single 
>> drive, let alone a striped set. The system has a relatively decent CPU, 
>> however only 2GB memory, do you think increasing this to 4GB would 
>> noticeably affect performance of my zpool? The memory is only DDR1.
> 
> 2GB or 4GB of RAM + dedup is a recipe for pain. Do yourself a favor, turn off 
> dedup
> and enable compression.
> -- richard
> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Is my bottleneck RAM?

Hi guys, sorry in advance if this is somewhat a lowly question, I've recently 
built a zfs test box based on nexentastor with 4x samsung 2tb drives connected 
via SATA-II in a raidz1 configuration with dedup enabled compression off and 
pool version 23. From running bonnie++ I get the following results:

Version 1.03b   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
nexentastor  4G 60582  54 20502   4 12385   3 53901  57 105290  10 429.8   1
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16  7181  29 + +++ + +++ 21477  97 + +++ + +++
nexentastor,4G,60582,54,20502,4,12385,3,53901,57,105290,10,429.8,1,16,7181,29,+,+++,+,+++,21477,97,+,+++,+,+++


I'd expect more than 105290K/s on a sequential read as a peak for a single 
drive, let alone a striped set. The system has a relatively decent CPU, however 
only 2GB memory, do you think increasing this to 4GB would noticeably affect 
performance of my zpool? The memory is only DDR1.

Thanks in advance.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A few questions

2011-01-09 Thread Michael Sullivan

Just to add a bit to this, I just love sweeping generalizations...

On 9 Jan 2011, at 19:33 , Richard Elling wrote:

> On Jan 9, 2011, at 4:19 PM, Edward Ned Harvey 
>  wrote:
> 
>>> From: Pasi Kärkkäinen [mailto:pa...@iki.fi]
>>> 
>>> Other OS's have had problems with the Broadcom NICs aswell..
>> 
>> Yes.  The difference is, when I go to support.dell.com and punch in my
>> service tag, I can download updated firmware and drivers for RHEL that (at
>> least supposedly) solve the problem.  I haven't tested it, but the dell
>> support guy told me it has worked for RHEL users.  There is nothing
>> available to download for solaris.
> 
> The drivers are written by Broadcom and are, AFAIK, closed source.
> By going through Dell, you are going through a middle-man. For example,
> 
> http://www.broadcom.com/support/ethernet_nic/netxtremeii10.php
> 
> where you see the release of the Solaris drivers was at the same time
> as Windows.
> 

What Richard says is true.

Broadcom have been a source of contention in the Linux world as well as the 
*BSD world due to the proprietary nature of their firmware.  
OpenSolaris/Solaris users are not the only ones who have complained about this. 
 There's been much uproar in the FOSS community about Broadcom and their 
drivers.  As a result, I've seen some pretty nasty hacks like people using the 
Windows drivers linked into their kernel - *gack*  I forget all the gory 
details, but it was rather disgusting as I recall, bubblegum, bailing wire, 
duct tape and all.

Dell and Red Hat aren't exactly a marriage made in heaven either.  I've had 
problems getting support from both Dell and Red Hat, them pointing fingers at 
each other rather than solving the problem.  Like most people, I've had to come 
up with my own work-arounds, like others with the Broadcom issue, using a 
"known quantity" NIC.

When dealing with Dell as a corporate buyer, they have always made it quite 
clear that they are primarily a Windows platform.  Linux, oh yes, we have that 
too...

>> Also, the bcom is not the only problem on that server.  After I added-on an
>> intel network card and disabled the bcom, the weekly crashes stopped, but
>> now it's ...  I don't know ... once every 3 weeks with a slightly different
>> mode of failure.  This is yet again, rare enough that the system could very
>> well pass a certification test, but not rare enough for me to feel
>> comfortable putting into production as a primary mission critical server.

I've never been particularly warm and fuzzy with Dell servers.  They seem to 
like to change their chipsets slightly while a model is in production.  This 
can cause all sorts of problems which are difficult to diagnose since an 
"identical" Dell system will have no problems, and it's mate will crash weekly.

>> 
>> I really think there are only two ways in the world to engineer a good solid
>> server:
>> (a) Smoke your own crack.  Systems engineering teams use the same systems
>> that are sold to customers.
> 
> This is rarely practical, not to mention that product development
> is often not in the systems engineering organization.
> 
>> or
>> (b) Sell millions of 'em.  So despite whether or not the engineering team
>> uses them, you're still going to have sufficient mass to dedicate engineers
>> to the purpose of post-sales bug solving.
> 
> yes, indeed :-)
> -- richard

As for certified systems, It's my understanding that Nexenta themselves don't 
"certify" anything.  They have systems which are recommended and supported by 
their network of VAR's.  It just so happens that SuperMicro is one of the 
brands of choice, but even then one must adhere to a fairly tight HCL.  The 
same holds true for Solaris/OpenSolaris with third-party hardware.

SATA Controllers and multiplexers are also another example of the drivers being 
written by the manufacturer and Solaris/OpenSolaris are not a priority over 
Windows and Linux, in that order.

Deviation from items which are not somewhat "plain vanilla" and are not listed 
on the HCL is just asking for trouble.

Mike

---
Michael Sullivan   
michael.p.sulli...@me.com
http://www.kamiogi.net/
Mobile: +1-662-202-7716
US Phone: +1-561-283-2034
JP Phone: +81-50-5806-6242

smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Michael DeMan


On Jan 7, 2011, at 6:13 AM, David Magda wrote:

> On Fri, January 7, 2011 01:42, Michael DeMan wrote:
>> Then - there is the other side of things.  The 'black swan' event.  At
>> some point, given percentages on a scenario like the example case above,
>> one simply has to make the business justification case internally at their
>> own company about whether to go SHA-256 only or Fletcher+Verification?
>> Add Murphy's Law to the 'black swan event' and of course the only data
>> that is lost is that .01% of your data that is the most critical?
> 
> The other thing to note is that by default (with de-dupe disabled), ZFS
> uses Fletcher checksums to prevent data corruption. Add also the fact all
> other file systems don't have any checksums, and simply rely on the fact
> that disks have a bit error rate of (at best) 10^-16.
> 
Agreed - but I think it is still missing the point of what the original poster 
was asking about.

In all honesty I think the debate is a business decision - the highly 
improbable vs. certainty.

Somebody somewhere must have written this stuff up, along with simple use cases?
Perhaps even a new acronym?  MTBC - mean time before collision?

And even with the 'certainty' factor being the choice - other things like human 
error come in to play and are far riskier?




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Michael DeMan

At the end of the day this issue essentially is about mathematical 
improbability versus certainty?

To be quite honest, I too am skeptical about about using de-dupe just based on 
SHA256.  In prior posts it was asked that the potential adopter of the 
technology provide the mathematical reason to NOT use SHA-256 only.  However, 
if Oracle believes that it is adequate to do that, would it be possible for 
somebody to provide:

(A) The theoretical documents and associated mathematics specific to say one 
simple use case?
(A1) Total data size is 1PB (lets say the zpool is 2PB to not worry about that 
part of it).
(A2) Daily, 10TB of data is updated, 1TB of data is deleted, and 1TB of data is 
'new'.
(A3) Out of the dataset, 25% of the data is capable of being de-duplicated
(A4) Between A2 and A3 above, the 25% rule from A3 also applies to everything 
in A2.

I think the above would be a pretty 'soft' case for justifying the case that 
SHA-256 works?  I would presume some kind of simple kind of scenario 
mathematically has been run already by somebody inside Oracle/Sun long ago when 
first proposing that ZFS be funded internally at all?

Then - there is the other side of things.  The 'black swan' event.  At some 
point, given percentages on a scenario like the example case above, one simply 
has to make the business justification case internally at their own company 
about whether to go SHA-256 only or Fletcher+Verification?  Add Murphy's Law to 
the 'black swan event' and of course the only data that is lost is that .01% of 
your data that is the most critical?

Not trying to be aggressive or combative here at all against peoples opinions 
and understandings of it all - I would just like to see some hard information 
about it all - it must exist somewhere already?

Thanks,

- Mike

On Jan 6, 2011, at 10:05 PM, Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Peter Taps
>> 
>> Perhaps (Sha256+NoVerification) would work 99.99% of the time. But
> 
> Append 50 more 9's on there. 
> 99.%
> 
> See below.
> 
> 
>> I have been told that the checksum value returned by Sha256 is almost
>> guaranteed to be unique. In fact, if Sha256 fails in some case, we have a
>> bigger problem such as memory corruption, etc. Essentially, adding
>> verification to sha256 is an overkill.
> 
> Someone please correct me if I'm wrong.  I assume ZFS dedup matches both the
> blocksize and the checksum right?  A simple checksum collision (which is
> astronomically unlikely) is still not sufficient to produce corrupted data.
> It's even more unlikely than that.
> 
> Using the above assumption, here's how you calculate the probability of
> corruption if you're not using verification:
> 
> Suppose every single block in your whole pool is precisely the same size
> (which is unrealistic in the real world, but I'm trying to calculate worst
> case.)  Suppose the block is 4K, which is again, unrealistically worst case.
> Suppose your dataset is purely random or sequential ... with no duplicated
> data ... which is unrealisic because if your data is like that, then why in
> the world are you enabling dedupe?  But again, assuming worst case
> scenario...  At this point we'll throw in some evil clowns, spit on a voodoo
> priestess, and curse the heavens for some extra bad luck.
> 
> If you have astronomically infinite quantities of data, then your
> probability of corruption approaches 100%.  With infinite data, eventually
> you're guaranteed to have a collision.  So the probability of corruption is
> directly related to the total amount of data you have, and the new question
> is:  For anything Earthly, how near are you to 0% probability of collision
> in reality?
> 
> Suppose you have 128TB of data.  That is ...  you have 2^35 unique 4k blocks
> of uniformly sized data.  Then the probability you have any collision in
> your whole dataset is (sum(1 thru 2^35))*2^-256 
> Note: sum of integers from 1 to N is  (N*(N+1))/2
> Note: 2^35 * (2^35+1) = 2^35 * 2^35 + 2^35 = 2^70 + 2^35
> Note: (N*(N+1))/2 in this case = 2^69 + 2^34
> So the probability of data corruption in this case, is 2^-187 + 2^-222 ~=
> 5.1E-57 + 1.5E-67
> 
> ~= 5.1E-57
> 
> In other words, even in the absolute worst case, cursing the gods, running
> without verification, using data that's specifically formulated to try and
> cause errors, on a dataset that I bet is larger than what you're doing, ...
> 
> Before we go any further ... The total number of bits stored on all the
> storage in the whole planet is a lot smaller than the total number of
> molecules in the planet.
> 
> There are estimated 8.87 * 10^49 molecules in planet Earth.
> 
> The probability of a collision in your worst-case unrealistic dataset as
> described, is even 100 million times less likely than randomly finding a
> single specific molecule in the whole planet Earth

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Michael Sullivan

Ed, with all due respect to your math,

I've seen rsync bomb due to an SHA256 collision, so I know it can and does 
happen.

I respect my data, so even with checksumming and comparing the block size, I'll 
still do a comparison check if those two match.  You will end up with silent 
data corruption which could affect you in so many ways.

Do you want to stake your career and reputation on that?  With a client or 
employer's data? I sure don't.

"Those who walk on the razor's edge are destined to be cut to ribbons…" Someone 
I used to work with said that, not me.

For my home media server, maybe, but even then I'd hate to lose any of my 
family photos or video due to a hash collision.

I'll play it safe if I dedup.

Mike

---
Michael Sullivan   
michael.p.sulli...@me.com
http://www.kamiogi.net/
Mobile: +1-662-202-7716
US Phone: +1-561-283-2034
JP Phone: +81-50-5806-6242

On 7 Jan 2011, at 00:05 , Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Peter Taps
>> 
>> Perhaps (Sha256+NoVerification) would work 99.99% of the time. But
> 
> Append 50 more 9's on there. 
> 99.%
> 
> See below.
> 
> 
>> I have been told that the checksum value returned by Sha256 is almost
>> guaranteed to be unique. In fact, if Sha256 fails in some case, we have a
>> bigger problem such as memory corruption, etc. Essentially, adding
>> verification to sha256 is an overkill.
> 
> Someone please correct me if I'm wrong.  I assume ZFS dedup matches both the
> blocksize and the checksum right?  A simple checksum collision (which is
> astronomically unlikely) is still not sufficient to produce corrupted data.
> It's even more unlikely than that.
> 
> Using the above assumption, here's how you calculate the probability of
> corruption if you're not using verification:
> 
> Suppose every single block in your whole pool is precisely the same size
> (which is unrealistic in the real world, but I'm trying to calculate worst
> case.)  Suppose the block is 4K, which is again, unrealistically worst case.
> Suppose your dataset is purely random or sequential ... with no duplicated
> data ... which is unrealisic because if your data is like that, then why in
> the world are you enabling dedupe?  But again, assuming worst case
> scenario...  At this point we'll throw in some evil clowns, spit on a voodoo
> priestess, and curse the heavens for some extra bad luck.
> 
> If you have astronomically infinite quantities of data, then your
> probability of corruption approaches 100%.  With infinite data, eventually
> you're guaranteed to have a collision.  So the probability of corruption is
> directly related to the total amount of data you have, and the new question
> is:  For anything Earthly, how near are you to 0% probability of collision
> in reality?
> 
> Suppose you have 128TB of data.  That is ...  you have 2^35 unique 4k blocks
> of uniformly sized data.  Then the probability you have any collision in
> your whole dataset is (sum(1 thru 2^35))*2^-256 
> Note: sum of integers from 1 to N is  (N*(N+1))/2
> Note: 2^35 * (2^35+1) = 2^35 * 2^35 + 2^35 = 2^70 + 2^35
> Note: (N*(N+1))/2 in this case = 2^69 + 2^34
> So the probability of data corruption in this case, is 2^-187 + 2^-222 ~=
> 5.1E-57 + 1.5E-67
> 
> ~= 5.1E-57
> 
> In other words, even in the absolute worst case, cursing the gods, running
> without verification, using data that's specifically formulated to try and
> cause errors, on a dataset that I bet is larger than what you're doing, ...
> 
> Before we go any further ... The total number of bits stored on all the
> storage in the whole planet is a lot smaller than the total number of
> molecules in the planet.
> 
> There are estimated 8.87 * 10^49 molecules in planet Earth.
> 
> The probability of a collision in your worst-case unrealistic dataset as
> described, is even 100 million times less likely than randomly finding a
> single specific molecule in the whole planet Earth by pure luck.
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A few questions

2011-01-05 Thread Michael Schuster

On Wed, Jan 5, 2011 at 15:34, Edward Ned Harvey
 wrote:
>> From: Deano [mailto:de...@rattie.demon.co.uk]
>> Sent: Wednesday, January 05, 2011 9:16 AM
>>
>> So honestly do we want to innovate ZFS (I do) or do we just want to follow
>> Oracle?
>
> Well, you can't follow Oracle.  Unless you wait till they release something,
> reverse engineer it, and attempt to reimplement it.

that's not my understanding - while we will have to wait, oracle is
supposed to release *some* source code afterwards to satisfy some
claim or other. I agree, some would argue that that should have
already happened with S11 express... I don't know it has, but that's
not *the* release of S11, is it? And once the code is released, even
if after the fact, it's not reverse-engineering anymore, is it?

Michael
PS: just in case: even while at Oracle, I had no insight into any of
these plans, much less do I have now.
-- 
regards/mit freundlichen Grüssen
Michael Schuster
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A couple of quick questions

2010-12-22 Thread Michael Schuster

I can't answer any of these authoritatively(?), but have a comment:

On Wed, Dec 22, 2010 at 10:55, Per Hojmark  wrote:
> 1) What's the maximum number of disk devices that can be used to construct 
> filesystems?

lots.

> 2) Is there a practical limit on #1? I've seen messages where folks suggested 
> 40 physical devices is the practical maximum. That would seem to imply a 
> maximum single volume size of 80TB...

how does that follow, or, in other words, why do you believe zfs can
only handle 2 TB per physical disc? (hint: look up GTP or EFI label
;-)

HTH
-- 
regards/mit freundlichen Grüssen
Michael Schuster
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Ideas for ghetto file server data reliability?

2010-11-16 Thread michael . p . sullivan

Ummm… there's a difference between data integrity and data corruption.

Integrity is enforced programmatically by something like a DBMS.  This sets up 
basic rules that ensure the programmer, program or algorithm adhere to a level 
of sanity and bounds.

Corruption is where cosmic rays, bit rot, malware or some other item writes to 
the block level.  ZFS protects systems from a lot of this by the way it's 
constructed to keep metadata, checksums, and duplicates of critical data.

If the filesystem is given bad data it will faithfully lay it down on disk.  If 
that faulty data gets corrupt, ZFS will come in and save the day.

Regards,

Mike

On Nov 16, 2010, at 11:28, Edward Ned Harvey  wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Toby Thain
>> 
>> The corruption will at least be detected by a scrub, even in cases where
> it
>> cannot be repaired.
> 
> Not necessarily.  Let's suppose you have some bad memory, and no ECC.  Your
> application does 1 + 1 = 3.  Then your application writes the answer to a
> file.  Without ECC, the corruption happened in memory and went undetected.
> Then the corruption was written to file, with a correct checksum.  So in
> fact it's not filesystem corruption, and ZFS will correctly mark the
> filesystem as clean and free of checksum errors.
> 
> In conclusion:
> 
> Use ECC if you care about your data.
> Do backups if you care about your data.
> 
> Don't be a cheapskate, or else, don't complain when you get bitten by lack
> of adequate data protection.
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Running on Dell hardware?

2010-11-01 Thread Michael Sullivan

Congratulations Ed, and welcome to "open systems…"

Ah, but Nexenta is open and has "no vendor lock-in."  That's what you probably 
should have done is bank everything on Illumos and Nexenta.  A winning 
combination by all accounts.

But then again, you could have used Linux on any hardware as well.  Then your 
hardware and software issues would probably be multiplied even more.

Cheers,

Mike

---
Michael Sullivan   
michael.p.sulli...@me.com
http://www.kamiogi.net/
Japan Mobile: +81-80-3202-2599
US Phone: +1-561-283-2034

On 23 Oct 2010, at 12:53 , Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Kyle McDonald
>> 
>> I'm currently considering purchasing 1 or 2 Dell R515's.
>> 
>> With up to 14 drives, and up to 64GB of RAM, it seems like it's well
>> suited
>> for a low-end ZFS server.
>> 
>> I know this box is new, but I wonder if anyone out there has any
>> experience with it?
>> 
>> How about the H700 SAS controller?
>> 
>> Anyone know where to find the Dell 3.5" sleds that take 2.5" drives? I
>> want to put some SSD's in a box like this, but there's no way I'm
>> going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are
>> they kidding?
> 
> You are asking for a world of hurt.  You may luck out, and it may work
> great, thus saving you money.  Take my example for example ... I took the
> "safe" approach (as far as any non-sun hardware is concerned.)  I bought an
> officially supported dell server, with all dell blessed and solaris
> supported components, with support contracts on both the hardware and
> software, fully patched and updated on all fronts, and I am getting system
> failures approx once per week.  I have support tickets open with both dell
> and oracle right now ... Have no idea how it's all going to turn out.  But
> if you have a problem like mine, using unsupported hardware, you have no
> alternative.  You're up a tree full of bees, naked, with a hunter on the
> ground trying to shoot you.  And IMHO, I think the probability of having a
> problem like mine is higher when you use the unsupported hardware.  But of
> course there's no definable way to quantize that belief.
> 
> My advice to you is:  buy the supported hardware, and the support contracts
> for both the hardware and software.  But of course, that's all just a
> calculated risk, and I doubt you're going to take my advice.  ;-)
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [RFC] Backup solution

2010-10-08 Thread Michael DeMan

On Oct 8, 2010, at 4:33 AM, Edward Ned Harvey wrote:

>> From: Peter Jeremy [mailto:peter.jer...@alcatel-lucent.com]
>> Sent: Thursday, October 07, 2010 10:02 PM
>> 
>> On 2010-Oct-08 09:07:34 +0800, Edward Ned Harvey 
>> wrote:
>>> If you're going raidz3, with 7 disks, then you might as well just make
>>> mirrors instead, and eliminate the slow resilver.
>> 
>> There is a difference in reliability:  raidzN means _any_ N disks can
>> fail, whereas mirror means one disk in each mirror pair can fail.
>> With a mirror, Murphy's Law says that the second disk to fail will be
>> the pair of the first disk :-).
> 
> Maybe.  But in reality, you're just guessing the probability of a single
> failure, the probability of multiple failures, and the probability of
> multiple failures within the critical time window and critical redundancy
> set.
> 
> The probability of a 2nd failure within the critical time window is smaller
> whenever the critical time window is decreased, and the probability of that
> failure being within the critical redundancy set is smaller whenever your
> critical redundancy set is smaller.  So if raidz2 takes twice as long to
> resilver than a mirror, and has a larger critical redundancy set, then you
> haven't gained any probable resiliency over a mirror.
> 
> Although it's true with mirrors, it's possible for 2 disks to fail and
> result in loss of pool, I think the probability of that happening is smaller
> than the probability of a 3-disk failure in the raidz2.
> 
> How much longer does a 7-disk raidz2 take to resilver as compared to a
> mirror?  According to my calculations, it's in the vicinity of 10x longer.  
> 

This article has been posted elsewhere, is about 10 months old, but is a good 
read:

http://queue.acm.org/detail.cfm?id=1670144

Really, there should be a ballpark / back of the napkin formula to be able to 
calculate this?  I've been curious about this too, so here goes a 1st cut...

DR = disk reliability, in terms of chance of the disk dying in any given time 
period, say any given hour?

DFW = disk full write - time to write every sector on the disk.  This will vary 
depending on system load, but is still an input item that can be determined by 
some testing.

RSM = resilver time for a mirror of two of the given disks
RSZ1 = resilver time for raidz1 vdev of two of the given disks?
RSZ2 = resilver time for raidz2 vdev of two of the given disks?

chances of losing all data in a mirror: DLM = RSM * DR.
chances of losing all data in a raiz1: DLRZ1 = RSZ1 * DR.
chances of losing all data in a raidz2: DLRZ2 = RSZ2 * DR * DR

Now, for the above, I'll make some other assumptions...

Lets just guess at a 1-year MTBF for our disks, and for purposes here, just 
flat line that at a failure rate  of chance per hour throughout the year.

Lets presume rebuilding a mirror takes one hour.
Lets presume that a 7-disk raidz1 takes 24 times longer to rebuild one disk 
than a mirror, I think this would be a 'safe' ratio to the benefit of the 
mirror.
Lets presume that a 7-disk raidz2 takes 72 times longer to rebuild one disk 
than a mirror, this should be 'safe' and again benefit to the mirror.

DR for a one hour period = 1 / 24 hours / 365 day = .000114 - chance a disk 
might die in any given hour.

DLM = one hour * DR = .000114

DLRZ1 = 24 hours * DR = .0001114 * 6 ( x6 because there are six more drives in 
the pool, and any one of them could fail)

DLRZ2 = 72 hours * DR * DR = (72 * (.0001114 * 6-disks) * (.0001114 * 5 disks)  
= a much tinier chance of losing all that data.

A better way to think about it maybe

Based on our 1-year flat-line MTBF for disks, to figure out how much faster the 
mirror must rebuild for reliability to be the same as a raidz2...

DLM = DLRZ2

.0001114 * 1 hour = X hours * (.0001114 * 6-disks) * (.0001114 * 5 disks)

X = (.0001114 * 6-disks) * 5 

X = .003342

So, the mirror would have to resilver three hundred times faster than the raiz2 
 (1 / .003342) in order for it to offer the same levels of reliability in 
regards to the chances of losing the entire vdev due to additional disk 
failures during a resilver?

The governing thing here is that O(2) level of reliability based on expected 
chances of failure of  additional disks during any given moment in time, vs. 
O(1) for mirrors and raidz1?

Note that the above is O(2) for raidz2 and O(1) for mirror/raidz1, because we 
are working on the assumption we have already lost one disk.

With raidz3, we would have ( 1  /  (.0001114 * 4-disks remaining in pool ), or 
about 2,000 times more reliability?

Now, the above does not include things like proper statistics that the chances 
of that 2nd and 3rd disk failing (even correlations) may be higher than our 
'flat-line' %/hr. based on 1-year MTBF, or stuff like if all the disks were 
purchased in the same lots and at the same time, so their chances of failing 
around the same time is higher, etc.

___

Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Michael DeMan

Can you give us release numbers that confirm that this is 'automatic'.  It is 
my understanding that the last available public release of OpenSolaris does not 
do this.

On Oct 5, 2010, at 8:52 PM, Richard Elling wrote:

> ZFS already aligns the beginning of data areas to 4KB offsets from the label.
> For modern OpenSolaris and Solaris implementations, the default starting 
> block for partitions is also aligned to 4KB.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] TLER and ZFS

Hi upfront, and thanks for the valuable information.

On Oct 5, 2010, at 4:12 PM, Peter Jeremy wrote:

>> Another annoying thing with the whole 4K sector size, is what happens
>> when you need to replace drives next year, or the year after?
> 
> About the only mitigation needed is to ensure that any partitioning is
> based on multiples of 4KB.

I agree, but to be quite honest, I have no clue how to do this with ZFS.  It 
seems that it should be something under the regular tuning documenation.  

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

Is it going to be the case that basic information like about how to deal with 
common scenarios like this is no longer going to be publicly available, and 
Oracle will simply keep it 'close to the vest', with the relevant information 
simply available for those who choose to research it themselves, or only 
available to those with certain levels of support contracts from Oracle?

To put it another way - does the community that uses ZFS need to fork 'ZFS Best 
Practices' and 'ZFZ Evil Tuning' to ensure that it is reasonably up to date?

Sorry for the somewhat hostile in the above, but the changes w/ the merger have 
demoralized a lot of folks I think.

- Mike

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] TLER and ZFS

On Oct 5, 2010, at 2:47 PM, casper@sun.com wrote:
> 
> 
> I've seen several important features when selecting a drive for
> a mirror:
> 
>   TLER (the ability of the drive to timeout a command)
>   sector size (native vs virtual)
>   power use (specifically at home)
>   performance (mostly for work)
>   price
> 
> I've heard scary stories about a mismatch of the native sector size and
> unaligned Solaris partitions (4K sectors, unaligned cylinder).
> 

Yes, avoiding the 4K sector sizes is a huge issue right now too - another item 
I forgot on the reasons to absolutely avoid those WD 'green' drives.

Three good reasons to avoid WD 'green' drives for ZFS...

- TLER issues
- IntelliPower head park issues
- 4K sector size issues

...they are an absolutely nightmare.  

The WD 1TB 'enterprise' drives are still 512 sector size and safe to use, who 
knows though, maybe they just started shipping with 4K sector size as I write 
this e-mail?

Another annoying thing with the whole 4K sector size, is what happens when you 
need to replace drives next year, or the year after?  That part has me worried 
on this whole 4K sector migration thing more than what to buy today.  Given the 
choice, I would prefer to buy 4K sector size now, but operating system support 
is still limited.  Does anybody know if there any vendors that are shipping 4K 
sector drives that have a jumper option to make them 512 size?  WD has a 
jumper, but is there explicitly to work with WindowsXP, and is not a real way 
to dumb down the drive to 512.  I would presume that any vendor that is 
shipping 4K sector size drives now, with a jumper to make it 'real' 512, would 
be supporting that over the long run?

I would be interested, and probably others would too, on what the original 
poster finally decides on this?

- Mike

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] TLER and ZFS

On Oct 5, 2010, at 1:47 PM, Roy Sigurd Karlsbakk wrote:

>> Western Digital RE3 WD1002FBYS 1TB 7200 RPM SATA 3.0Gb/s 3.5" Internal
>> Hard Drive -Bare Drive
>> 
>> are only $129.
>> 
>> vs. $89 for the 'regular' black drives.
>> 
>> 45% higher price, but it is my understanding that the 'RAID Edition'
>> ones also are physically constructed for longer life, lower vibration
>> levels, etc.
> 
> Well, here it's about 60% up and for 150 drives, that makes a wee 
> difference...
> 
> Vennlige hilsener / Best regards
> 
> roy

Understood on 1.6  times cost, especially for quantity 150 drives.

I think (and if I am wrong, somebody else correct me) - that if you are using 
commodity controllers, which seems to generally fine for ZFS, then if a drive 
times out trying to constantly re-read a bad sector, it could stall out the 
read on the entire pool overall.  On the other hand, if the drives are exported 
as JBOD from a RAID controller, I would think the RAID controller itself would 
just mark the drive as bad and offline it quickly based on its own internal 
algorithms. 

The above would also be relevant to the anticipated usage.  For instance, if it 
is some sort of backup machine and delays due to some reads stalling on out 
TLER then perhaps it is not a big deal.  If it is for more of an up-front 
production use, that could be intolerable.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] TLER and ZFS

I'm not sure on the TLER issues by themselves, but after the nightmares I have 
gone through dealing with the 'green drives', which have both the TLER issue 
and the IntelliPower head parking issues, I would just stay away from it all 
entirely and pay extra for the 'RAID Editiion' drives.

Just out of curiosity, I took a peek a newegg.

Western Digital RE3 WD1002FBYS 1TB 7200 RPM SATA 3.0Gb/s 3.5" Internal Hard 
Drive -Bare Drive  

are only $129.

vs. $89 for the 'regular' black drives.

45% higher price, but it is my understanding that the 'RAID Edition' ones also 
are physically constructed for longer life, lower vibration levels, etc.

On Oct 5, 2010, at 1:30 PM, Roy Sigurd Karlsbakk wrote:

> Hi all
> 
> I just discovered WD Black drives are rumored not to be set to allow TLER. 
> Does anyone know how much performance impact the lack of TLER might have on a 
> large pool? Choosing Enterprise drives will cost about 60% more, and on a 
> large install, that means a lot of money...
> 
> Vennlige hilsener / Best regards
> 
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 97542685
> r...@karlsbakk.net
> http://blogg.karlsbakk.net/
> --
> I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
> er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
> idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
> relevante synonymer på norsk.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] "zfs unmount" versus "umount"?

2010-09-30 Thread Michael Schuster


On 30.09.10 15:42, Mark J Musante wrote:

On Thu, 30 Sep 2010, Linder, Doug wrote:


Is there any technical difference between using "zfs unmount" to unmount
a ZFS filesystem versus the standard unix "umount" command? I always use
"zfs unmount" but some of my colleagues still just use umount. Is there
any reason to use one over the other?


No, they're identical. If you use 'zfs umount' the code automatically maps
it to 'unmount'. It also maps 'recv' to 'receive' and '-?' to call into the
usage function. Here's the relevant code from main():


Mark, I think that wasn't the question, rather, "what's the difference 
between 'zfs u[n]mount' and '/usr/bin/umount'?"


HTH
Michael
--
michael.schus...@oracle.com http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file recovery on lost RAIDZ array

2010-09-28 Thread Michael Eskowitz


I'm sorry to say that I am quite the newbie to ZFS.  When you say zfs 
send/receive what exactly are you referring to?

I had the zfs array mounted to a specific location in my file system 
(/mnt/Share) and I was sharing that location over the network with a samba 
server.  The directory had read-write-execute persion set to allow anyone to 
write to it and I was copying data from windows into it.

At what point do file changes get committed to the file system?  I sort of 
assumed that any additional files copied over would be committed once the next 
file began copying.

Thanks for your insight.

-Mike



  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file recovery on lost RAIDZ array

2010-09-13 Thread Michael Eskowitz

Oh and yes, raidz1.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file recovery on lost RAIDZ array

2010-09-13 Thread Michael Eskowitz

I don't know what happened.  I was in the process of copying files onto my new 
file server when the copy process from the other machine failed.  I turned on 
the monitor for the fileserver and found that it had rebooted by itself at some 
point (machine fault maybe?) and when I remounted the drives every last thing 
was gone.

I am new to zfs.  How do you take snapshots?  Does the sytem do it 
automagically for you?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] file recovery on lost RAIDZ array

2010-09-12 Thread Michael Eskowitz

I recently lost all of the data on my single parity raid z array.  Each of the 
drives was encrypted with the zfs array built within the encrypted volumes.

I am not exactly sure what happened.  The files were there and accessible and 
then they were all gone.  The server apparently crashed and rebooted and 
everything was lost.  After the crash I remounted the encrypted drives and the 
zpool was still reporting that roughly 3TB of the 7TB array were used, but I 
could not see any of the files through the array's mount point.  I unmounted 
the zpool and then remounted it and suddenly zpool was reporting 0TB were used. 
 I did not remap the virtual device.  The only thing of note that I saw was 
that the name of storage pool had changed.  Originally it was "Movies" and then 
it became "Movita".  I am guessing that the file system became corrupted some 
how.  (zpool status did not report any errors)

So, my questions are these... 

Is there anyway to undelete data from a lost raidz array?  If I build a new 
virtual device on top of the old one and the drive topology remains the same, 
can we scan the drives for files from old arrays?

Also, is there any way to repair a corrupted storage pool?  Is it possible to 
backup the file table or whatever partition index zfs maintains?


I imagine that you all are going to suggest that I scrub the array, but that is 
not an option at this point.  I had a backup of all of the data lost as I am 
moving between file servers so at a certain point I gave up and decided to 
start fresh.  This doesn't give me a warm fuzzy feeling about zfs, though.

Thanks,
-Mike
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS with SAN's and HA

2010-08-26 Thread Michael Dodwell

Lao,

I had a look at the HAStoragePlus etc and from what i understand that's to 
mirror local storage across 2 nodes for services to be able to access 'DRBD 
style'. 

Having a read thru the documentation on the oracle site the cluster software 
from what i gather is how to cluster services together (oracle/apache etc) and 
again any documentation i've found on storage is how to duplicate local storage 
to multiple hosts for HA failover. Can't really see anything on clustering 
services to use shared storage/zfs pools.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS with SAN's and HA

2010-08-26 Thread Michael Dodwell

Hey all,

I currently work for a company that has purchased a number of different SAN 
solutions (whatever was cheap at the time!) and i want to setup a HA ZFS file 
store over fiber channel.

Basically I've taken slices from each of the sans and added them to a ZFS pool 
on this box (which I'm calling a 'ZFS proxy'). I've then carved out LUN's from 
this pool and assigned them to other servers. I then have snapshots taken on 
each of the LUN's and replication off site for DR. This all works perfectly 
(backups for ESXi!)

However, I'd like to be able to a) expand and b) make it HA. All the 
documentation i can find on setting up a HA cluster for file stores replicates 
data from 2 servers and then serves from these computers (i trust the SAN's to 
take care of the data and don't want to replicate anything -- cost!). Basically 
all i want is for the node that serves the ZFS pool to be HA (if this was to be 
put into production we have around 128tb and are looking to expand to a pb). We 
have a couple of IBM SVC's that seem to handle the HA node setup in some 
obscure property IBM way so logically it seems possible.

Clients would only be making changes via a single 'zfs proxy' at a time 
(multi-pathing setup for fail over only) so i don't believe I'd need to OCFS 
the setup? If i do need to setup OCFS can i put ZFS on top of that? (want 
snap-shotting/rollback and replication to a off site location, as well as all 
the goodness of thin provisioning and de-duplication)

However when i import the ZFS pool onto the 2nd box i got large warnings about 
it being mounted elsewhere and i needed to force the import, then when 
importing the LUN's i saw that the GUUID was different so multi-pathing doesn't 
pick that the LUN's are the same? can i change a GUUID via smtfadm? Is any of 
this even possible over fiber channel? Is anyone able to point me at some 
documentation? Am i simply crazy?

Any input would be most welcome.

Thanks in advance,
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs/iSCSI: 0000 = SNS Error Type: Current Error (0x70)

2010-08-26 Thread Michael W Lucas

Hi,

I'm trying to track down an error with a 64bit x86 OpenSolaris 2009.06 ZFS 
shared via iSCSI and an Ubuntu 10.04 client.  The client can successfully log 
in, but no device node appears.  I captured a session with wireshark.  When the 
client attempts a "SCSI: Inquiry LUN: 0x00", OpenSolaris sends a "SCSI Response 
(Check Condition) LUN:0x00" that contains the following:

.111  = SNS Error Type: Current Error (0x70)
Filemark: 0, EOM: 0, ILI: 0
 0100 = Sense Key: Hardware Error (0x04)

The ZFS being exported is a 400GB chunk of a 1TB ZFS mirror.  The underlying OS 
reports no hardware errors, and "zpool status" looks OK.  Why would OpenSolaris 
give this error?  Is there anything I can do for it?  Any suggestions would be 
appreciated.

(I discussed this with the open-iscsi people at 
http://groups.google.com/group/open-iscsi/browse_thread/thread/06b83227ffc6a31a/2e58a163e21ec74e#2e58a163e21ec74e.)

Thanks,
==ml
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 64-bit vs 32-bit applications

2010-08-16 Thread Michael Schuster


On 17.08.10 04:17, Will Murnane wrote:

On Mon, Aug 16, 2010 at 21:58, Kishore Kumar Pusukuri
  wrote:

Hi,
I am surprised with the performances of some 64-bit multi-threaded
applications on my AMD Opteron machine. For most of the applications, the
performance of 32-bit version is almost same as the performance of 64-bit
version. However, for a couple of applications, 32-bit versions provide
better performance (running-time is around 76 secs) than 64-bit (running
time is around 96 secs). Could anyone help me to find the reason behind
this, please?

[...]

This list discusses the ZFS filesystem.  Perhaps you'd be better off
posting to perf-discuss or tools-gcc?

That said, you need to provide more information.  What compiler and
flags did you use?  What does your program (broadly speaking) do?
What did you measure to conclude that it's slower in 64-bit mode?


add to that: what OS are you using?

Michael
--
michael.schus...@oracle.com http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Degraded Pool, Spontaneous Reboots

2010-08-12 Thread Michael Anderson

Hello,

I've been getting warnings that my zfs pool is degraded. At first it was 
complaining about a few corrupt files, which were listed as hex numbers instead 
of filenames, i.e.

VOL1:<0x0>

After a scrub, a couple of the filenames appeared - turns out they were in 
snapshots I don't really need, so I destroyed those snapshots and started a new 
scrub. Subsequently, I typed " zpool status -v VOL1" ... and the machine 
rebooted. When I could log on again, I looked at /var/log/messages, but found 
nothing interesting prior to the reboot. I typed " zpool status -v VOL1" again, 
whereupon the machine rebooted. When the machine was back up, I stopped the 
scrub, waited a while, then typed "zpool status -v VOL1" again, and this time 
got:


r...@nexenta1:~# zpool status -v VOL1
pool: VOL1
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scan: scrub canceled on Wed Aug 11 11:03:15 2010
config:

NAME STATE READ WRITE CKSUM
VOL1 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
c2d0 DEGRADED 0 0 0 too many errors
c3d0 DEGRADED 0 0 0 too many errors
c4d0 DEGRADED 0 0 0 too many errors
c5d0 DEGRADED 0 0 0 too many errors

So, I have the following questions:

1) How do I find out which file is corrupt, when I only get something like 
"VOL1:<0x0>"
2) What could be causing these reboots?
3) How can I fix my pool?

Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS p[erformance drop with new Xeon 55xx and 56xx cpus

2010-08-11 Thread michael schuster


On 08/12/10 04:16, Steve Gonczi wrote:

Greetings,

I am seeing some unexplained performance drop using the above cpus,
using a fairly up-to-date build ( late 145).
Basically, the system seems to be 98% idle, spending most if its time in this 
stack:

   unix`i86_mwait+0xd
   unix`cpu_idle_mwait+0xf1
   unix`idle+0x114
   unix`thread_start+0x8
455645

Most cpus seem to be idling most of the time, sitting on the mwait instruction.
No lock contention, not waiting on io, I am finding myself at a loss explaining 
what this system is doing.
(I am monitoring the system w. lockstat, mpstat, prstat).  Despite the 
predominantly idle system,
I see some latency reported by prstat microstate accounting on the zfs threads.

This is a fairly beefy box, 24G memory,  16 cpus.
Doing a local zfs send | receive, should be getting at least 100MB+,
and I am only getting  5-10MB.
I see some Intel errata on the 55xx series xeons, a problem with the
monitor/mwait instructions, that could conceivably cause missed wake-up or 
mis-reported  mwait status.


I'd suggest you supply a bit more information (to the list, not to me, I 
don't know very much about zfs internals):


- zpool/zfs configuration
- history of this issue: has it been like this since you installed the 
machine?

  - if no: what changes were introduced around the time you saw this first?
- does this happen on a busy machine too?
- describe your test in more detail
- provide measurements (lockstat, iostat, maybe some DTrace) before and 
during test, add some timestamps so people can correlate data to events.

- anything else you can think of that might be relevant.

HTH
Michael
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] core dumps eating space in snapshots

2010-07-27 Thread Michael Schuster


On 27.07.10 14:21, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of devsk

I have many core files stuck in snapshots eating up gigs of my disk
space. Most of these are BE's which I don't really want to delete right
now.


Ok, you don't want to delete them ...



Is there a way to get rid of them? I know snapshots are RO but can I do
some magic with clones and reclaim my space?


You don't want to delete them, but you don't want them to take up space
either?  Um ... Sorry, can't be done.  Move them to a different disk ...

Or clarify what it is that you want.

If you're saying you have core files in your present filesystem that you
don't want to delete ... And you also have core files in snapshots that you
*do* want to delete ...  As long as the file hasn't been changing, it's not
consuming space beyond what's in the current filesystem.  (See the output of
zfs list, looking at sizes and you'll see that.)  If it has been changing
... the cores in snapshot are in fact different from the cores in present
filesytem ... then the only way to delete them is to destroy snapshots.

Or have I still misunderstood the question?


yes, I think so.

Here's how I read it: the snapshots contain lots more than the core files, 
and OP wants to remove only the core files (I'm assuming they weren't 
discovered before the snapshot was taken) but retain the rest.


does that explain it better?

HTH
Michael
--
michael.schus...@oracle.com http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Help identify failed drive

On Mon, Jul 19, 2010 at 4:35 PM, Richard Elling  wrote:

> I depends on if the problem was fixed or not.  What says
>        zpool status -xv
>
>  -- richard

[r...@nas01 ~]# zpool status -xv
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 14h2m with 0 errors on Sun Jul 18 18:32:38 2010
config:

NAMESTATE READ WRITE CKSUM
tankDEGRADED 0 0 0
  raidz2ONLINE   0 0 0
c0t3d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c0t4d0  ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c0t6d0  ONLINE   0 0 0
c0t7d0  ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c0t5d0  ONLINE   0 0 0
  raidz2DEGRADED 0 0 0
c2t0d0  ONLINE   0 0 0
c2t1d0  ONLINE   0 0 0
c2t2d0  ONLINE   0 0 0
c2t3d0  ONLINE   0 0 0
c2t4d0  ONLINE   0 0 0
c2t5d0  DEGRADED 0 0 0  too many errors
c2t6d0  ONLINE   0 0 0
c2t7d0  ONLINE   0 0 0

was never fixed. I thought I needed to replace the drive. Should I
mark it as "resolved" or whatever the syntax is and re-run a scrub?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Help identify failed drive

On Mon, Jul 19, 2010 at 4:26 PM, Richard Elling  wrote:

> Aren't you assuming the I/O error comes from the drive?
> fmdump -eV

okay - I guess I am. Is this just telling me "hey stupid, a checksum
failed" ? In which case why did this never resolve itself and the
specific device get marked as degraded?

Apr 04 2010 21:52:38.920978339 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0x64350d4040300c01
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xfd80ebd352cc9271
vdev = 0x29282dc6fa073a2
(end detector)

pool = tank
pool_guid = 0xfd80ebd352cc9271
pool_context = 0
pool_failmode = wait
vdev_guid = 0x29282dc6fa073a2
vdev_type = disk
vdev_path = /dev/dsk/c2t5d0s0
vdev_devid = id1,s...@sata_st31500341as9vs077gt/a
parent_guid = 0xc2d5959dd2c07bf7
parent_type = raidz
zio_err = 0
zio_offset = 0x40abbf2600
zio_size = 0x200
zio_objset = 0x10
zio_object = 0x1c06000
zio_level = 2
zio_blkid = 0x0
__ttl = 0x1
__tod = 0x4bb96c96 0x36e503a3
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Help identify failed drive

On Mon, Jul 19, 2010 at 4:16 PM, Marty Scholes  wrote:

> Start a scrub or do an obscure find, e.g. "find /tank_mointpoint -name core" 
> and watch the drive activity lights.  The drive in the pool which isn't 
> blinking like crazy is a faulted/offlined drive.

Actually I guess my real question is why iostat hasn't logged any
errors in its counters even though the device has been bad in there
for months?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Help identify failed drive

On Mon, Jul 19, 2010 at 4:16 PM, Marty Scholes  wrote:

> Start a scrub or do an obscure find, e.g. "find /tank_mointpoint -name core" 
> and watch the drive activity lights.  The drive in the pool which isn't 
> blinking like crazy is a faulted/offlined drive.
>
> Ugly and oh-so-hackerish, but it works.

that was my idea except figuring out something to make just specific
drives write one at a time. although if it has been offlined or
whatever then it shouldn't receive any requests, that sounds even
easier. :)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Help identify failed drive