from:"Garrett D'Amore"

Re: [zfs-discuss] [developer] Re: History of EPERM for unlink() of directories on ZFS?

2012-06-27 Thread Garrett D'Amore



On Jun 26, 2012, at 4:46 PM, Lionel Cons wrote:

> On 25 June 2012 11:33,   wrote:
>> 
>> 
>> To be honest, I think we should also remove this from all other
>> filesystems and I think ZFS was created this way because all modern
>> filesystems do it that way.

I agree with Casper here.  This is a historical accident that would be nice to 
fix.

> 
> This may be wrong way to go if it breaks existing applications which
> rely on this feature. It does break applications in our case.

Really?!?  unlink on directories is an incredibly bad idea.  What is your 
application?  Do you know why it is doing this?

> 
> Anyway, we've added this to the list of mandatory features and see
> what we can procure with that.

Rule out ZFS, and most other filesystems unless you are happy to have your 
application run with elevated privilege.

- Garrett

> 
> Lionel
> 
> 
> ---
> illumos-developer
> Archives: https://www.listbox.com/member/archive/182179/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182179/21239177-c925e33f
> Modify Your Subscription: 
> https://www.listbox.com/member/?member_id=21239177&id_secret=21239177-4dba8197
> Powered by Listbox: http://www.listbox.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [developer] History of EPERM for unlink() of directories on ZFS?

2012-06-25 Thread Garrett D'Amore

I don't know the precise history, but I think its a mistake to permit direct 
link() or unlink() of directories.  I do note that on BSD (MacOS at least) 
unlink returns EPERM if the executing user is not superuser.  I do see that the 
man page for unlink() says this on illumos:

 The  named   file   is   a   directory   and
 {PRIV_SYS_LINKDIR}  is  not  asserted in the
 effective set of the calling process, or the
 filesystem  implementation  does not support
 unlink() or unlinkat() on directories.

I can't imagine why you'd *ever* want to support unlink() of a *directory* -- 
what's the use case for it anyway (outside of filesystem repair)?

Garrett D'Amore
garr...@damore.org

On Jun 25, 2012, at 2:23 AM, Lionel Cons wrote:

> Does someone know the history which led to the EPERM for unlink() of
> directories on ZFS? Why was this done this way, and not something like
> allowing the unlink and execute it on the next scrub or remount?
> 
> Lionel
> 
> 
> ---
> illumos-developer
> Archives: https://www.listbox.com/member/archive/182179/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182179/21239177-c925e33f
> Modify Your Subscription: 
> https://www.listbox.com/member/?member_id=21239177&id_secret=21239177-4dba8197
> Powered by Listbox: http://www.listbox.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] making network configuration sticky in nexenta core/napp-it

2012-01-10 Thread Garrett D'Amore

put the configuration in /etc/hostname.if0  (where if0 is replaced by the name 
of your interface, such as /etc/hostname.e1000g0)

Without an IP address in such a static file, the system will default to DHCP 
and hence override other settings.

- Garrett

On Jan 10, 2012, at 8:54 AM, Eugen Leitl wrote:

> 
> Sorry for an off-topic question, but anyone knows how to make
> network configuration (done with ifconfig/route add) sticky in 
> nexenta core/napp-it?
> 
> After reboot system reverts to 0.0.0.0 and doesn't listen
> to /etc/defaultrouter
> 
> Thanks.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Any rhyme or reason to disk dev names?

2011-12-21 Thread Garrett D'Amore


On Dec 21, 2011, at 3:14 AM, James C. McPherson wrote:

> On 21/12/11 05:58 PM, Matthew R. Wilson wrote:
>> Hello,
>> 
>> I am curious to know if there is an easy way to guess or identify the
>> device names of disks. Previously the /dev/dsk/c0t0d0s0 system made sense
>> to me... I had a SATA controller card with 8 ports, and they showed up
>> with the numbers 1-8 in the "t" position of the device name.
>> 
>> But I just built a new system with two LSI SAS HBAs in it, and my device
>> names are along the lines of:
>> /dev/dsk/c0t5000CCA228C0E488d0
>> 
>> I could not find any correlation between that identifier and the a)
>> controller the disk was plugged in to, or b) the port number on the
>> controller. The only way I could make a mapping of device name to
>> controller port was to add one drive at a time, reboot the system, and run
>> "format" to see which new disk name shows up.
>> 
>> I'm guessing there's a better way, but I can't find any obvious answer as
>> to how to determine which port on my LSI controller card will correspond
>> with which seemingly random device name. Can anyone offer any suggestions
>> on a way to predict the device naming, or at least get the system to list
>> the disks after I insert one without rebooting?
> 
> Hi Matthew,
> By default the names for disks attached via mpt_sas(7d), or
> mpt(7d) if your disks are new enough, is to use their WWN
> as reported in the SCSI INQUIRY Page83 response.
> 
> The old paradigm you refer to is based on the physical id
> of the device on a parallel SCSI bus. That doesn't scale
> with SAS, and is something we're trying to move away from.

More to the point, on SAS and other similar busses, there simply *isn't* such a 
thing as a simple target number.  The old numbering scheme from parallel SCSI 
was suitable when you could have only 7 or 15 or so devices on a single bus.  
With modern busses you can have many thousands of devices on the same fabric.  
So we address them by WWN.

- Garrett
> 
> If you'd like some info about how we use devids and guids,
> please refer to my presentation
> 
> http://www.jmcp.homeunix.com/~jmcp/WhatIsAGuid.pdf
> 
> 
> For your particular configuration, if you note the serial
> number and WWN of the device before you insert them, you
> can match that up with info from  iostat -En  and/or prtconf -v.
> 
> 
> hth,
> James C. McPherson
> --
> Oracle
> http://www.jmcp.homeunix.com/blog
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SATA hardware advice

2011-12-19 Thread Garrett D'Amore


On Dec 19, 2011, at 7:52 AM, Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. wrote:

> AFAIK, most ZFS based storage appliance are move to SAS with 7200 rpm or 15k 
> rpm
> most SSD are SATA and are connecting to on bd SATA with IO chips

Most *cheap* SSDs are SATA.  But if you want to use them in a cluster 
configuration, you need to use a SAS device that supports multiple initiators, 
such as those from STEC.

- Garrett
> 
> 
> On 12/19/2011 9:59 AM, tono wrote:
>> Thanks for the sugestions, especially all the HP info and build
>> pictures.
>> 
>> Two things crossed my mind on the hardware front. The first is regarding
>> the SSDs you have pictured, mounted in sleds. Any Proliant that I've
>> read about connects the hotswap drives via a SAS backplane. So how did
>> you avoid that (physically) to make the direct SATA connections?
>> 
>> The second is regarding a conversation I had with HP pre-sales. A rep
>> actually told me, in no uncertain terms, that using non-HP HBAs, RAM, or
>> drives would completely void my warranty. I assume this is BS but I
>> wonder if anyone has ever gotten resistance due to 3rd party hardware.
>> In the States, at least, there is the Magnuson–Moss act. I'm just not
>> sure if it applies to servers.
>> 
>> Back to SATA though. I can appreciate fully about not wanting to take
>> unnecessary risks, but there are a few things that don't sit well with
>> me.
>> 
>> A little background: this is to be a backup server for a small/medium
>> business. The data, of course, needs to be safe, but we don't need
>> extreme HA.
>> 
>> I'm aware of two specific issues with SATA drives: the TLER/CCTL
>> setting, and the issue with SAS expanders. I have to wonder if these
>> account for most of the bad rap that SATA drives get. Expanders are
>> built into nearly all of the JBODs and storage servers I've found
>> (including the one in the serverfault post), so they must be in common
>> use.
>> 
>> So I'll ask again: are there any issues when connecting SATA drives
>> directly to a HBA? People are, after all, talking left and right about
>> using SATA SSDs... as long as they are connected directly to the MB
>> controller.
>> 
>> We might just do SAS at this point for peace of mind. It just bugs me
>> that you can't use "inexpensive disks" in a R.A.I.D. I would think that
>> RAIDZ and AHCI could handle just about any failure mode by now.
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> -- 
> Hung-Sheng Tsao Ph D.
> Founder&  Principal
> HopBit GridComputing LLC
> cell: 9734950840
> http://laotsao.wordpress.com/
> http://laotsao.blogspot.com/
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS not starting

2011-12-01 Thread Garrett D'Amore

You have just learned the hard way that dedup is *highly* toxic if misused.  If 
you have a backup of your data, then you should delete the *pool*.  Trying to 
destroy the dataset (the zfs level filesystem) will probably never succeed 
unless you have it located on an SSD or you have an enormous amount of RAM 
(maybe 100GB?  I haven't done the math on your system).  There really isn't any 
other solution to this that I'm aware of.  (Destroying the filesystem means a 
*lot* of random I/O… your drives are probably completely swamped by the 
workload.)

In general, deleting data (especially filesystems) should almost never be done 
in the face of dedup, and you should not use dedup unless you know that your 
data has a lot of natural redundancies in it *and* you have adequate memory.

In general, dedup is *wrong* for use by typical home/hobbyist users.  It can 
make sense when hosting a lot of VM images, or in some situations like backups 
with a lot of redundant copies. 

I really wish we made it harder for end-users to enable dedup.  For the first 
year or so after Nexenta shipped it, it was the single most frequent source of 
support calls.

If you've not done the analysis already, and you're not a storage 
administrator, you probably should not enable dedup.

- Garrett

On Dec 1, 2011, at 1:43 PM, Gareth de Vaux wrote:

> Hi guys, when ZFS starts it ends up hanging the system.
> 
> We have a raidz over 5 x 2TB disks with 5 ZFS filesystems. (The
> root filesystem is on separate disks).
> 
> # uname -a
> FreeBSD fortinbras.XXX 8.2-STABLE FreeBSD 8.2-STABLE #0: Wed Oct 19 09:20:04 
> SAST 2011 r...@storage.xxx:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> ZFS filesystem version 5
> ZFS storage pool version 28
> 
> The setup was working great until we decided to try out deduplication.
> After testing it on a few files I set deduplication on 1 of the
> filesystems and moved around 1.5TB of data into it. There was basically
> no space saving, just a big performance hit, so we decided to take it off.
> While deleting a directory on this filesystem in preparation the system
> hung. After an abnormal half an hour long bootup everything was fine,
> and a scrub was clean. We then decided to rather just destroy this
> filesystem as it happened to have disposable data on and would be 100
> times quicker(?) I ran the zfs destroy command which sat there for about
> 40 hours while the free space on the pool gradually increased, at which
> point the system hung again. Rebooting took hours, stuck at ZFS
> initialisation, after which the console returned:
> 
> pid 37 (zfs), uid 0, was killed: out of swap space
> pid 38 (sh), uid 0, was killed: out of swap space
> 
> This's now the state I'm stuck in. I can naturally boot up without ZFS
> but once I start it manually the disks in the pool start flashing for
> an hour or 2 and the system hangs before they finish doing whatever
> they're doing.
> 
> The system has 6GB of RAM and a 10GB swap partition. I added a 30GB
> swap file but this hasn't helped.
> 
> # sysctl hw.physmem
> hw.physmem: 6363394048
> 
> # sysctl vfs.zfs.arc_max
> vfs.zfs.arc_max: 5045088256
> 
> (I lowered arc_max to 1GB but hasn't helped)
> 
> I've included the output when starting ZFS after setting vfs.zfs.debug=1
> at the bottom. There's no more ZFS output for the next few hours while
> the disks are flashing and the system is responsive.
> 
> A series of top outputs after starting ZFS:
> 
> last pid:  1536;  load averages:  0.00,  0.02,  0.08
> 21 processes:  1 running, 20 sleeping
> CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> Mem: 12M Active, 7344K Inact, 138M Wired, 44K Cache, 10M Buf, 5678M Free
> Swap: 39G Total, 39G Free
> 
> last pid:  1567;  load averages:  0.13,  0.05,  0.08
> 25 processes:  1 running, 24 sleeping
> CPU:  0.0% user,  0.0% nice,  2.3% system,  2.1% interrupt, 95.6% idle
> Mem: 14M Active, 7880K Inact, 328M Wired, 44K Cache, 13M Buf, 5485M Free
> Swap: 39G Total, 39G Free
> 
> last pid:  1632;  load averages:  0.06,  0.04,  0.06
> 25 processes:  1 running, 24 sleeping
> CPU:  0.0% user,  0.0% nice,  0.5% system,  0.1% interrupt, 99.4% idle
> Mem: 14M Active, 8040K Inact, 2421M Wired, 40K Cache, 13M Buf, 3392M Free
> Swap: 39G Total, 39G Free
> 
> last pid:  1693;  load averages:  0.11,  0.10,  0.08
> 25 processes:  1 running, 24 sleeping
> CPU:  0.0% user,  0.0% nice,  0.3% system,  0.1% interrupt, 99.5% idle
> Mem: 14M Active, 8220K Inact, 4263M Wired, 40K Cache, 13M Buf, 1550M Free
> Swap: 39G Total, 39G Free
> 
> last pid:  1767;  load averages:  0.00,  0.00,  0.00
> 25 processes:  1 running, 24 sleeping
> CPU:  0.0% user,  0.0% nice, 27.6% system,  0.0% interrupt, 72.4% idle
> Mem: 14M Active, 8212K Inact, 4380M Wired, 40K Cache, 13M Buf, 1433M Free
> Swap: 39G Total, 39G Free
> 
> *sudden system freeze*
> 
> Whether it ends up utilising the swap space and/or thrashing I don't know -
> I only know that the zpool disks' fancy LED's have stoppe

Re: [zfs-discuss] zfs sync=disabled property

2011-11-11 Thread Garrett D'Amore

Generally, there should not be "corruption", only a roll-back to a previous 
state.  *HOWEVER*, its possible that an application which has state outside of 
the filesystem (such as effects on network peers, or even state written to 
*other* filesystems) will encounter a consistency problem as the application 
will not be expecting this potentially "partial" rollback of state.

This state *could* be state tracked in remote systems, or VMs, for example.

Generally, I discourage disabling the sync unless you know *exactly* what you 
are doing.  On my build filesystems I do it, because I can regenerate all the 
data, and a loss of up to 30 seconds of data is no problem for me.  But I don't 
do this on home directories, or filesystems used for "arbitrary" application 
storage.  And I would *never* do this for a filesystem that is backing a 
database.

As they say, better safe than sorry.

- Garrett

On Nov 10, 2011, at 11:12 AM, Tomas Forsman wrote:

> On 10 November, 2011 - Bob Friesenhahn sent me these 1,6K bytes:
> 
>> On Wed, 9 Nov 2011, Tomas Forsman wrote:
 
 At all times, if there's a server crash, ZFS will come back along at next
 boot or mount, and the filesystem will be in a consistent state, that was
 indeed a valid state which the filesystem actually passed through at some
 moment in time.  So as long as all the applications you're running can
 accept the possibility of "going back in time" as much as 30 sec, following
 an ungraceful ZFS crash, then it's safe to disable ZIL (set sync=disabled).
>>> 
>>> Client writes block 0, server says OK and writes it to disk.
>>> Client writes block 1, server says OK and crashes before it's on disk.
>>> Client writes block 2.. waaiits.. waiits.. server comes up and, server
>>> says OK and writes it to disk.
>>> 
>>> Now, from the view of the clients, block 0-2 are all OK'd by the server
>>> and no visible errors.
>>> On the server, block 1 never arrived on disk and you've got silent
>>> corruption.
>> 
>> The silent corruption (of zfs) does not occur due to simple reason that 
>> flushing all of the block writes are acknowledged by the disks and then a 
>> new transaction occurs to start the next transaction group. The previous 
>> transaction is not closed until the next transaction has been 
>> successfully started by writing the previous TXG group record to disk.  
>> Given properly working hardware, the worst case scenario is losing the 
>> whole transaction group and no "corruption" occurs.
>> 
>> Loss of data as seen by the client can definitely occur.
> 
> When a client writes something, and something else ends up on disk - I
> call that corruption. Doesn't matter whose fault it is and technical
> details, the wrong data was stored despite the client being careful when
> writing.
> 
> /Tomas
> -- 
> Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
> |- Student at Computing Science, University of Umeå
> `- Sysadmin at {cs,acc}.umu.se
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Couple of questions about ZFS on laptops

2011-11-10 Thread Garrett D'Amore


On Nov 9, 2011, at 6:08 PM, Francois Dion wrote:

> Some laptops have pc card and expresscard slots, and you can get an adapter 
> for sd card, so you could set up your os non mirrored and just set up home on 
> a pair of sd cards. Something like
> http://www.amazon.com/Sandisk-SDAD109A11-Digital-Card-Express/dp/B000W3QLLW
> 
> I've done this in the past, variations of this, including using a partition 
> and a usb stick:

SDcard is suitable for boot *only* if it is connected via USB.  While the 
drivers I wrote for SDHCI work fine for using media, you can't boot off it 
generally -- usually the laptop BIOS simply lacks the support needed to see 
these. 

It used to be that CompactFlash was a preferred option, but I think CF is 
falling out of favor these days.

- Garrett

> 
> http://solarisdesktop.blogspot.com/2007/02/stick-to-zfs-or-laptop-with-mirrored.html
> Wow, where did the time go, that was almost 5 years ago...
> 
> Anyway, i pretty much ditched carrying the laptop, the current one i have is 
> too heavy (m4400). But it does run really nicely sol11 and openindiana. The 
> m4400 is set up with 2 drives, not mirrored. I'm tempted to put a sandforce 
> based ssd for faster booting and better zfs perf for demos. Then i have an 
> sdcard and expresscard adapter for sd. This gives me 16gb mirrored for my 
> documents, which is plenty. 
> 
> Francois
> Sent from my iPad
> 
> On Nov 8, 2011, at 12:05 PM, Jim Klimov  wrote:
> 
>> Hello all,
>> 
>> I am thinking about a new laptop. I see that there are
>> a number of higher-performance models (incidenatlly, they
>> are also marketed as "gamer" ones) which offer two SATA
>> 2.5" bays and an SD flash card slot. Vendors usually
>> position the two-HDD bay part as either "get lots of
>> capacity with RAID0 over two HDDs, or get some capacity
>> and some performance by mixing one HDD with one SSD".
>> Some vendors go as far as suggesting a highest performance
>> with RAID0 over two SSDs.
>> 
>> Now, if I were to use this for work with ZFS on an
>> OpenSolaris-descendant OS, and I like my data enough
>> to want it mirrored, but still I want an SSD performance
>> boost (i.e. to run VMs in real-time), I seem to have
>> a number of options:
>> 
>> 1) Use a ZFS mirror of two SSDs
>>  - seems too pricey
>> 2) Use a HDD with redundant data (copies=2 or mirroring
>>  over two partitions), and an SSD for L2ARC (+maybe ZIL)
>>  - possible unreliability if the only HDD breaks
>> 3) Use a ZFS mirror of two HDDs
>>  - lowest performance
>> 4) Use a ZFS mirror of two HDDs and an SD card for L2ARC.
>>  Perhaps add another "built-in flash card" with PCMCIA
>>  adapters for CF, etc.
>> 
>> Now, there is a couple of question points for me here.
>> 
>> One was raised in my recent questions about CF ports in a
>> Thumper. The general reply was that even high-performance
>> CF cards are aimed for "linear" RW patterns and may be
>> slower than HDDs for random access needed as L2ARCs, so
>> flash cards may actually lower the system performance.
>> I wonder if the same is the case with SD cards, and/or
>> if anyone encountered (and can advise) some CF/SD cards
>> with good random access performance (better than HDD
>> random IOPS). Perhaps an extra IO path can be beneficial
>> even if random performances are on the same scale - HDDs
>> would have less work anyway and can perform better with
>> their other tasks?
>> 
>> On another hand, how would current ZFS behave if someone
>> ejects an L2ARC device (flash card) and replaces it with
>> another unsuspecting card, i.e. one from a photo camera?
>> Would ZFS automatically replace the L2ARC device and
>> kill the photos, or would the cache be disabled with
>> no fatal implication for the pools nor for the other
>> card? Ultimately, when the ex-L2ARC card gets plugged
>> back in, would ZFS automagically attach it as the cache
>> device, or does this have to be done manually?
>> 
>> 
>> Second question regards single-HDD reliability: I can
>> do ZFS mirroring over two partitions/slices, or I can
>> configure "copies=2" for the datasets. Either way I
>> think I can get protection from bad blocks of whatever
>> nature, as long as the spindle spins. Can these two
>> methods be considered equivalent, or is one preferred
>> (and for what reason)?
>> 
>> 
>> Also, how do other list readers place and solve their
>> preferences with their OpenSolaris-based laptops? ;)
>> 
>> Thanks,
>> //Jim Klimov
>> 
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs sync=disabled property

2011-11-08 Thread Garrett D'Amore

On Nov 8, 2011, at 6:38 AM, Evaldas Auryla wrote:

> Hi all,
> 
> I'm trying to evaluate what are the risks of running NFS share of zfs dataset 
> with sync=disabled property. The clients are vmware hosts in our environment 
> and server is SunFire X4540 "Thor" system. Though general recommendation 
> tells not to do this, but after testing performance with default setting and 
> sync=disabled - it's night and day, so it's really tempting to do 
> sync=disabled ! Thanks for any suggestion.

The risks are, any changes your software clients expect to be written to disk 
-- after having gotten a confirmation that they did get written -- might not 
actually be written if the server crashes or loses power for some reason.

You should consider a high performance low-latency SSD (doesn't have to be very 
big) as an SLOG… it will do a lot for your performance without having to give 
up the commit guarantees that you lose with sync=disabled.

Of course, if the data isn't precious to you, then running with sync=disabled 
is probably ok.  But if you love your data, don't do it.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] File contents changed with no ZFS error

2011-10-24 Thread Garrett D'Amore

You're using an *old* version of both OpenSolaris and zpool.  There have been a 
few corruption bugs fixed since then.  I'd recommend updating.

- Garrett

On Oct 22, 2011, at 9:27 AM, Robert Watzlavick wrote:

> I've noticed something strange over the past few months with four files on my 
> raidz.  Here's the setup:
> OpenSolaris snv_111b
> ZFS Pool version 14
> AMD-based server with ECC RAM.
> 5 ST3500630AS 500 GB SATA drives (4 active plus spare) in raidz1
> 
> The other day, I observed what appears to be undetected file corruption in 4 
> of the files on the raidz.  I have two external USB hard drives that I use to 
> back up the contents of the ZFS raidz on alternating months.  The USB hard 
> drives use EXT3 so they are connected to a Linux box which in turn connects 
> to the raidz over NFS.  Occasionally, I use the checksum option on rsync 
> (rsync -ainc) to make sure everything on the USB hard drives match before I 
> perform the real rsync back from the raid to the USB disk and that's when I 
> noticed the changes.  In each file, there was a single byte changed.  Running 
> zpool status doesn't show any errors and running zpool scrub doesn't show any 
> problems either.
> 
> One of the changed files was a .ppt file that I downloaded from the web over 
> a year ago and the other 3 were Acronis incremental Backup files from my XP 
> machine that get stored on the raidz.  Since ZFS files aren't supposed to be 
> corrupted without notification (right?), I initially assumed the problem was 
> with the USB drive.  For the 3 Acronis backup files, I had no way of knowing 
> which version was the correct one because Acronis shows all of them to be 
> valid.  The .ppt file was not on the web anymore but with the help of the 
> Wayback machine, I was able to re-download it and that's when I confirmed the 
> "good" copy from the web matches the copy on my USB hard drive, not the copy 
> on the raidz.  I know I haven't modified the .ppt file because the date still 
> matches the date I downloaded it, 2010-01-12.
> 
> What failure scenario could have caused this?  The file was obviously 
> initially good on the raidz because it got backed up to the USB drive and 
> that matches the "good" version from the web.
> 
> Thanks in advance,
> -Bob
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] repair [was: about btrfs and zfs]

2011-10-19 Thread Garrett D'Amore


On Oct 19, 2011, at 1:52 PM, Richard Elling wrote:

> On Oct 18, 2011, at 5:21 PM, Edward Ned Harvey wrote:
> 
>>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>>> boun...@opensolaris.org] On Behalf Of Tim Cook
>>> 
>>> I had and have redundant storage, it has *NEVER* automatically fixed
>>> it.  You're the first person I've heard that has had it automatically fix
>> it.
>> 
>> That's probably just because it's normal and expected behavior to
>> automatically fix it - I always have redundancy, and every cksum error I
>> ever find is always automatically fixed.  I never tell anyone here because
>> it's normal and expected.
> 
> Yes, and in fact the automated tests for ZFS developers intentionally 
> corrupts data
> so that the repair code can be tested. Also, the same checksum code is used 
> to 
> calculate the checksum when writing and reading.
> 
>> If you have redundancy, and cksum errors, and it's not automatically fixed,
>> then you should report the bug.
> 
> For modern Solaris-based implementations, each checksum mismatch that is
> repaired reports the bitmap of the corrupted vs expected data. Obviously, if 
> the
> data cannot be repaired, you cannot know the expected data, so the error is 
> reported without identification of the broken bits.
> 
> In the archives, you can find reports of recoverable and unrecoverable errors 
> attributed to:
>   1. ZFS software (rare, but a bug a few years ago mishandled a raidz 
> case)
>   2. SAN switch firmware
>   3. "Hardware" RAID array firmware
>   4. Power supplies
>   5. RAM
>   6. HBA
>   7. PCI-X bus
>   8. BIOS settings
>   9. CPU and chipset errata
> 
> Personally, I've seen all of the above except #7, because PCI-X hardware is
> hard to find now.

I've seen #7.  I have some PCI-X hardware that is flaky in my home lab. ;-)

There was a case of #1 not very long ago, but it was a difficult to trigger 
race and is fixed in illumos and I presume other derivatives (including 
NexentaStor).

- Garrett
> 
> If consistently see unrecoverable data from a system that has protected data, 
> then
> there may be an issue with a part of the system that is a single point of 
> failure. Very,
> very, very few x86 systems are designed with no SPOF.
> -- richard
> 
> -- 
> 
> ZFS and performance consulting
> http://www.RichardElling.com
> VMworld Copenhagen, October 17-20
> OpenStorage Summit, San Jose, CA, October 24-27
> LISA '11, Boston, MA, December 4-9 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] about btrfs and zfs

2011-10-19 Thread Garrett D'Amore

I'd argue that from a *developer* point of view, an fsck tool for ZFS might 
well be useful.  Isn't that what zdb is for? :-)

But ordinary administrative users should never need something like this, unless 
they have encountered a bug in ZFS itself.  (And bugs are as likely to exist in 
the checker tool as in the filesystem. ;-)

- Garrett


On Oct 19, 2011, at 2:15 PM, Pawel Jakub Dawidek wrote:

> On Wed, Oct 19, 2011 at 08:40:59AM +1100, Peter Jeremy wrote:
>> fsck verifies the logical consistency of a filesystem.  For UFS, this
>> includes: used data blocks are allocated to exactly one file,
>> directory entries point to valid inodes, allocated inodes have at
>> least one link, the number of links in an inode exactly matches the
>> number of directory entries pointing to that inode, directories form a
>> single tree without loops, file sizes are consistent with the number
>> of allocated blocks, unallocated data/inodes blocks are in the
>> relevant free bitmaps, redundant superblock data is consistent.  It
>> can't verify data.
> 
> Well said. I'd add that people who insist on ZFS having a fsck are
> missing the whole point of ZFS transactional model and copy-on-write
> design.
> 
> Fsck can only fix known file system inconsistencies in file system
> structures. Because there is no atomicity of operations in UFS and other
> file systems it is possible that when you remove a file, your system can
> crash between removing directory entry and freeing inode or blocks.
> This is expected with UFS, that's why there is fsck to verify that no
> such thing happend.
> 
> In ZFS on the other hand there are no inconsistencies like that. If all
> blocks match their checksums and you find directory loop or something
> like that, it is a bug in ZFS, not expected inconsistency. It should be
> fixed in ZFS and not work-arounded with some fsck for ZFS.
> 
> -- 
> Pawel Jakub Dawidek   http://www.wheelsystems.com
> FreeBSD committer http://www.FreeBSD.org
> Am I Evil? Yes, I Am! http://yomoli.com
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] All (pure) SSD pool rehash

2011-09-29 Thread Garrett D'Amore

On Sep 28, 2011, at 8:44 PM, Edward Ned Harvey wrote:

>> From: Richard Elling [mailto:richard.ell...@gmail.com]
>> 
>> Also, the default settings for the resilver throttle are set for HDDs. For
> SSDs,
>> it is a
>> good idea to change the throttle to be more aggressive.
> 
> You mean...
> Be more aggressive, resilver faster?
> or Be more aggressive, throttling the resilver?
> 
> What's the reasoning that makes you want to set it differently from a HDD?

I think he means, resilver faster.

SSDs can be driven harder, and have more IOPs so we can hit them harder with 
less impact on the overall performance.  The reason we throttle at all is to 
avoid saturating the bandwidth of the drive with resilver which would prevent 
regular operations from making progress.  Generally I believe resilver 
operations are not "bandwidth bound" in the sense of pure throughput, but are 
IOPs bound.  As SSDs have no seek time, they can handle a lot more of these 
little operations than a regular hard disk.

  - Garrett

> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Does the zpool cache file affect import?

2011-08-30 Thread Garrett D'Amore


On 08/30/2011 04:59 AM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." wrote:


IMHO, you need to know how to recover the Zpool if the data/metadata 
get corrupt due to import by two hosts

may be things are improved for zfs recently.
you design cluster by certain rules, hope that all SA(not just you) 
will follow your rules, otherwise some SA may just import zpool that 
create cachefile and next time the system will want to import the 
zpool in cachefile that already imported by other hosts
Sun cluster use its own cachefile to control the cache, but as any 
design it could not prevent SA to import zpool by hand:-(


ZFS has some safeguards to protect against this error -- but those 
safeguards are bypassed when using "-f" to import a pool.  Its 
absolutely vital that system administrators are taught to try without -f 
first, and the only use -f to import a pool when they are *absolutely 
certain* that no other system can be accessing the pool concurrently.


Clusters require a greater degree of care and skill than ordinary 
systems.  This is one reason that I try to steer folks towards 
application redundancy rather than cluster configurations, when 
redundancy at the application layer is available.


- Garrett


my 2c

On 8/29/2011 9:03 PM, Gary Mills wrote:

On Mon, Aug 29, 2011 at 07:56:16PM -0400, LaoTsao wrote:

Q?
are you intent to import this zpool to different host?

Yes, it can be imported on another server.  That part works when it
has been exported cleanly first.  I was concerned about a possible
import failure when the original server lost power.


Sent from my iPad

Sent from my Sun type 6 keyboard.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs destory snapshot takes an hours.

2011-08-10 Thread Garrett D'Amore

also, snapshot destroys are much slower with older releases such as 134.  i 
recommend an upgrade.  but an upgrade will not help much if you are using dedup.

  -- Garrett D'Amore

On Aug 10, 2011, at 8:32 PM, Edward Ned Harvey 
 wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Ian Collins
>> 
>>> I am facing issue with zfs destroy, this takes almost 3 Hours to
>> delete the snapshot of size 150G.
>>> 
>> Do you have dedup enabled?
> 
> I have always found, zfs destroy takes some time.  zpool destroy takes no
> time.
> 
> Although zfs destroy takes some time, it's not terrible unless you have
> dedup enabled.  If you have dedup enabled, then yes it's terrible, as Ian
> suggested.
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Entire client hangs every few seconds

2011-07-26 Thread Garrett D'Amore

This is actually a recently known problem, and a fix for it is in the
3.1 version, which should be available any minute now, if it isn't
already available.

The problem has to do with some allocations which are sleeping, and jobs
in the ZFS subsystem get backed behind some other work.

If you have adequate system memory, you are less likely to see this
problem, I think.

 - Garrett


On Tue, 2011-07-26 at 08:29 -0700, Rocky Shek wrote:
> Ian,
> 
> Did you enable DeDup? 
> 
> Rocky 
> 
> 
> -Original Message-
> From: zfs-discuss-boun...@opensolaris.org
> [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Ian D
> Sent: Tuesday, July 26, 2011 7:52 AM
> To: zfs-discuss@opensolaris.org
> Subject: [zfs-discuss] Entire client hangs every few seconds
> 
> Hi all-
> We've been experiencing a very strange problem for two days now.  
> 
> We have three client (Linux boxes) connected to a ZFS box (Nexenta) via
> iSCSI.  Every few seconds (seems random), iostats shows the clients go from
> an normal 80K+ IOPS to zero.  It lasts up to a few seconds and things are
> fine again.  When that happens, I/Os on the local disks stops too, even the
> totally unrelated ones. How can that be?  All three clients show the same
> pattern and everything was fine prior to Sunday.  Nothing has changed on
> neither the clients or the server. The ZFS box is not even close to be
> saturated, nor the network.
> 
> We don't even know where to start... any advices?
> Ian


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pure SSD Pool

2011-07-12 Thread Garrett D'Amore

I think high end SSDs, like those from Pliant, use a significant amount of 
"over allocation", and internal remapping and internal COW, so that they can 
automatically garbage collect when they need to, without TRIM.  This only works 
if the drive has enough extra free space that it knows about (because it uses 
overallocation for example).

TRIM support is still something we want in ZFS, for a variety of reasons, 
including SSD performance.  I think you can expect to hear more on this front 
before too much longer, so stay tuned.

  -- Garrett D'Amore

On Jul 12, 2011, at 7:42 AM, "Eric Sproul"  wrote:

> On Tue, Jul 12, 2011 at 1:06 AM, Brandon High  wrote:
>> On Mon, Jul 11, 2011 at 7:03 AM, Eric Sproul  wrote:
>>> Interesting-- what is the suspected impact of not having TRIM support?
>> 
>> There shouldn't be much, since zfs isn't changing data in place. Any
>> drive with reasonable garbage collection (which is pretty much
>> everything these days) should be fine until the volume gets very full.
> 
> But that's exactly the problem-- ZFS being copy-on-write will
> eventually have written to all of the available LBA addresses on the
> drive, regardless of how much live data exists.  It's the rate of
> change, in other words, rather than the absolute amount that gets us
> into trouble with SSDs.  The SSD has no way of knowing what blocks
> contain live data and which have been freed, because the OS never
> tells it (that's what TRIM is supposed to do).  So after ZFS has
> written to almost every LBA, it starts writing to addresses previously
> used (and freed by ZFS, but unknown to the SSD), so the SSD has to
> erase the cell before it can be written anew.  This incurs a heavy
> performance penalty and seems like a worst-case-scenario use case.
> 
> Now, others have hinted that certain controllers are better than
> others in the absence of TRIM, but I don't see how GC could know what
> blocks are available to be erased without information from the OS.
> 
> Those with deep knowledge of SSD models/controllers: how does the
> Intel 320 perform under ZFS as primary storage (not ZIL or L2ARC)?
> 
> Eric
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

2011-06-20 Thread Garrett D'Amore

For SSD we have code in illumos that disables disksort.  Ultimately, we believe 
that the cost of disksort is in the noise for performance.

  -- Garrett D'Amore

On Jun 20, 2011, at 8:38 AM, "Andrew Gabriel"  wrote:

> Richard Elling wrote:
>> On Jun 19, 2011, at 6:04 AM, Andrew Gabriel wrote:
>>  
>>> Richard Elling wrote:
>>>
>>>> Actually, all of the data I've gathered recently shows that the number of 
>>>> IOPS does not significantly increase for HDDs running random workloads. 
>>>> However the response time does :-( My data is leading me to want to 
>>>> restrict the queue depth to 1 or 2 for HDDs.
>>>>   
>>> Thinking out loud here, but if you can queue up enough random I/Os, the 
>>> embedded disk controller can probably do a good job reordering them into 
>>> less random elevator sweep pattern, and increase IOPs through reducing the 
>>> total seek time, which may be why IOPs does not drop as much as one might 
>>> imagine if you think of the heads doing random seeks (they aren't random 
>>> anymore). However, this requires that there's a reasonable queue of I/Os 
>>> for the controller to optimise, and processing that queue will necessarily 
>>> increase the average response time. If you run with a queue depth of 1 or 
>>> 2, the controller can't do this.
>>>
>> 
>> I agree. And disksort is in the mix, too.
>>  
> 
> Oh, I'd never looked at that.
> 
>>> This is something I played with ~30 years ago, when the OS disk driver was 
>>> responsible for the queuing and reordering disc transfers to reduce total 
>>> seek time, and disk controllers were dumb.
>>>
>> 
>> ...and disksort still survives... maybe we should kill it?
>>  
> 
> It looks like it's possibly slightly worse than the pathologically worst 
> response time case I described below...
> 
>>> There are lots of options and compromises, generally weighing reduction in 
>>> total seek time against longest response time. Best reduction in total seek 
>>> time comes from planning out your elevator sweep, and inserting newly 
>>> queued requests into the right position in the sweep ahead. That also gives 
>>> the potentially worse response time, as you may have one transfer queued 
>>> for the far end of the disk, whilst you keep getting new transfers queued 
>>> for the track just in front of you, and you might end up reading or writing 
>>> the whole disk before you get to do that transfer which is queued for the 
>>> far end. If you can get a big enough queue, you can modify the insertion 
>>> algorithm to never insert into the current sweep, so you are effectively 
>>> planning two sweeps ahead. Then the worse response time becomes the time to 
>>> process one queue full, rather than the time to read or write the whole 
>>> disk. Lots of other tricks too (e.g. insertion into sweeps taking into 
>>> account priority, such as i
 f
> the I/O is a synchronous or asynchronous, and age of existing queue entries). 
> I had much fun playing with this at the time.
>>>
>> 
>> The other wrinkle for ZFS is that the priority scheduler can't re-order I/Os 
>> sent to the disk.
>>  
> 
> Does that also go through disksort? Disksort doesn't seem to have any concept 
> of priorities (but I haven't looked in detail where it plugs in to the whole 
> framework).
> 
>> So it might make better sense for ZFS to keep the disk queue depth small for 
>> HDDs.
>> -- richard
>>  
> 
> -- 
> Andrew Gabriel
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] JBOD recommendation for ZFS usage

2011-05-30 Thread Garrett D'Amore

Dunno about Germany, but LSI and DataON both have offerings.  (The LSI units 
are probably going fast, as LSI exits that business having sold that unit to 
NetApp.)

  -- Garrett D'Amore

On May 30, 2011, at 10:08 AM, "Thomas Nau"  wrote:

> Dear all
> 
> Sorry if it's kind of off-topic for the list but after talking
> to lots of vendors I'm running out of ideas...
> 
> We are looking for JBOD systems which
> 
> (1) hold 20+ 3.3" SATA drives
> 
> (2) are rack mountable
> 
> (3) have all the nive hot-swap stuff
> 
> (4) allow 2 hosts to connect via SAS (4+ lines per host) and see
>all available drives as disks, no RAID volume.
>In a perfect world both hosts would connect each using
>two independent SAS connectors
> 
> 
> The box will be used in a ZFS Solaris/based fileserver in a
> fail-over cluster setup. Only one host will access a drive
> at any given time.
> 
> It seems that a lot of vendors offer JBODs but so far I haven't found
> one in Germany which handles (4).
> 
> Any hints?
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS, Oracle and Nexenta

2011-05-26 Thread Garrett D'Amore

I actually didn't know that their meetings were totally open.  I'm more 
familiar with IEEE, T10, and similar bodies which are most definitely not open.

  -- Garrett D'Amore

On May 25, 2011, at 6:12 PM, "Bob Friesenhahn"  
wrote:

> On Wed, 25 May 2011, Garrett D'Amore wrote:
> 
>> You are welcome to your beliefs.  There are many groups that do standards 
>> that do not meet in public.  In fact, I can't think of any standards bodies 
>> that *do* hold open meetings.
> 
> The IETF holds totally open meetings.  I hope that you are appreciative of 
> that since they brought you the Internet and enabled us to send this email.  
> Clearly it works.
> 
> Bob
> -- 
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS, Oracle and Nexenta

2011-05-25 Thread Garrett D'Amore

You are welcome to your beliefs.   There are many groups that do standards that 
do not meet in public.  In fact, I can't think of any standards bodies that 
*do* hold open meetings.

  -- Garrett D'Amore

On May 25, 2011, at 4:09 PM, "Joerg Schilling" 
 wrote:

> "Garrett D'Amore"  wrote:
> 
>> I am sure that the group exists ... I am a part of it, as are many of the 
>> former Oracle ZFS engineers and a number of other ZFS contributors.
>> 
>> Whatever your proposal was, we have not seen it, but a solution has been 
>> agreed upon widely already, and implementation should be starting on it.  
>> Ultimately this solution is based on people with a huge amount of experience 
>> in ZFS, and with an eye towards future ZFS features.
> 
> I tend to believe that a group that acts in the secret does not exist.
> 
> Standardization nowerdays typically is done in the public. 
> 
> JÃ¶rg
> 
> -- 
> EMail:jo...@schily.isdn.cs.tu-berlin.de (home) JÃ¶rg Schilling D-13353 Berlin
>   j...@cs.tu-berlin.de(uni)  
>   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
> http://schily.blogspot.com/
> URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS, Oracle and Nexenta

2011-05-25 Thread Garrett D'Amore

This will absolutely remain possible -- as the party responsible for Nexenta's 
kernel, I can assure that pool import/export compatibility is a key requirement 
for Nexenta's product.

  -- Garrett D'Amore

On May 25, 2011, at 3:39 PM, "Frank Van Damme"  wrote:

> Op 24-05-11 22:58, LaoTsao schreef:
>> With various fock of opensource project
>> E.g. Zfs, opensolaris, openindina etc there are all different
>> There are not guarantee to be compatible 
> 
> I hope at least they'll try. Just in case I want to import/export zpools
> between Nexenta and OpenIndiana?
> 
> -- 
> No part of this copyright message may be reproduced, read or seen,
> dead or alive or by any means, including but not limited to telepathy
> without the benevolence of the author.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris vs FreeBSD question

2011-05-18 Thread Garrett D'Amore

We might have a better change of diagnosing your problem if we had a copy of 
your panic message buffer.  Have you considered OpenIndiana and illumos as an 
option, or even NexentaStor if you are just looking for a storage appliance 
(though my guess is that you need more general purpose compute capabilities)? 

  -- Garrett D'Amore

On May 18, 2011, at 2:48 PM, "Paul Kraus"  wrote:

> 
>Over the past few months I have seen mention of FreeBSD a couple
> time in regards to ZFS. My question is how stable (reliable) is ZFS on
> this platform ?
> 
>This is for a home server and the reason I am asking is that about
> a year ago I bought some hardware based on it's inclusion on the
> Solaris 10 HCL, as follows:
> 
> SuperMicro 7045A-WTB (although I would have preferred the server
> version, but it wasn't on the HCL)
> Two quad core 2.0 GHz Xeon CPUs
> 8 GB RAM (I am NOT planning on using DeDupe)
> 2 x Seagate ES-2 250 GB SATA drives for the OS
> 4 x Seagate ES-2 1 TB SATA drives for data
> Nvidia Geforce 8400 (cheapest video card I could get locally)
> 
>I could not get the current production Solaris or OpenSolaris to
> load. The miniroot would GPF while loading the kernel. I could not get
> the problem resolved and needed to get the server up and running as my
> old server was dying (dual 550 MHz P3 with 1 GB RAM) and I needed to
> get my data (about 600 GB) off of it before I lost anything. That old
> server was running Solaris 10 and the data was in a zpool with
> mirrored vdevs of different sized drives. I had lost one drive in each
> vdev and zfs saved my data. So I loaded OpenSuSE and moved the data to
> a mirrored pair of 1 TB drives.
> 
>I still want to move my data to ZFS, and push has come to shove,
> as I am about to overflow the 1 TB mirror and I really, really hate
> the Linux options for multiple disk device management (I'm spoiled by
> SVM and ZFS). So now I really need to get that hardware loaded with an
> OS that supports ZFS. I have tried every variation of Solaris that I
> can get my hands on including Solaris 11 Express and Nexenta 3 and
> they all GPF loading the kernel to run the installer. My last hope is
> that I have a very plain vanilla (ancient S540) video card to swap in
> for the Nvidia on the very long shot chance that is the problem. But I
> need a backup plan if that does not work.
> 
>I have tested the hardware with FreeBSD 8 and it boots to the
> installer. So my question is whether the FreeBSD ZFS port is up to
> production use ? Is there anyone here using FreeBSD in production with
> good results (this list tends to only hear about serious problems and
> not success stories) ?
> 
> P.S. If anyone here has a suggestion as to how to get Solaris to load
> I would love to hear it. I even tried disabling multi-cores (which
> makes the CPUs look like dual core instead of quad) with no change. I
> have not been able to get serial console redirect to work so I do not
> have a good log of the failures.
> 
> -- 
> {1-2-3-4-5-6-7-}
> Paul Kraus
> -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
> -> Sound Coordinator, Schenectady Light Opera Company (
> http://www.sloctheater.org/ )
> -> Technical Advisor, RPI Players
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 350TB+ storage solution

2011-05-16 Thread Garrett D'Amore

Actually it is 100 or less, i.e. a 10 msec delay.

  -- Garrett D'Amore

On May 16, 2011, at 11:13 AM, "Richard Elling"  wrote:

> On May 16, 2011, at 10:31 AM, Brandon High wrote:
>> On Mon, May 16, 2011 at 8:33 AM, Richard Elling
>>  wrote:
>>> As a rule of thumb, the resilvering disk is expected to max out at around
>>> 80 IOPS for 7,200 rpm disks. If you see less than 80 IOPS, then suspect
>>> the throttles or broken data path.
>> 
>> My system was doing far less than 80 IOPS during resilver when I
>> recently upgraded the drives. The older and newer drives were both 5k
>> RPM drives (WD10EADS and Hitachi 5K3000 3TB) so I don't expect it to
>> be super fast.
>> 
>> The worst resilver was 50 hours, the best was about 20 hours. This was
>> just my home server, which is lightly used. The clients (2-3 CIFS
>> clients, 3 mostly idle VBox instances using raw zvols, and 2-3 NFS
>> clients) are mostly idle and don't do a lot of writes.
>> 
>> Adjusting zfs_resilver_delay and zfs_resilver_min_time_ms sped things
>> up a bit, which suggests that the default values may be too
>> conservative for some environments.
> 
> I am more inclined to change the hires_tick value. The "delays" are in 
> units of clock ticks. For Solaris, the default clock tick is 10ms, that I will
> argue is too large for modern disk systems. What this means is that when 
> the resilver, scrub, or memory throttle causes delays, the effective IOPS is
> driven to 10 or less. Unfortunately, these values are guesses and are 
> probably suboptimal for various use cases. OTOH, the prior behaviour of
> no resilver or scrub throttle was also considered a bad thing.
> -- richard
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-09 Thread Garrett D'Amore

Just another data point.  The ddt is considered metadata, and by default the 
arc will not allow more than 1/4 of it to be used for metadata.   Are you still 
sure it fits?

Erik Trimble  wrote:

>On 5/7/2011 6:47 AM, Edward Ned Harvey wrote:
>>> See below.  Right around 400,000 blocks, dedup is suddenly an order of
>>> magnitude slower than without dedup.
>>>
>>> 40  10.7sec 136.7sec143 MB  195
>> MB
>>> 80  21.0sec 465.6sec287 MB  391
>> MB
>>
>> The interesting thing is - In all these cases, the complete DDT and the
>> complete data file itself should fit entirely in ARC comfortably.  So it
>> makes no sense for performance to be so terrible at this level.
>>
>> So I need to start figuring out exactly what's going on.  Unfortunately I
>> don't know how to do that very well.  I'm looking for advice from anyone -
>> how to poke around and see how much memory is being consumed for what
>> purposes.  I know how to lookup c_min and c and c_max...  But that didn't do
>> me much good.  The actual value for c barely changes at all over time...
>> Even when I rm the file, c does not change immediately.
>>
>> All the other metrics from kstat ... have less than obvious names ... so I
>> don't know what to look for...
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>Some minor issues that might affect the above:
>
>(1) I'm assuming you run your script repeatedly in the same pool, 
>without deleting the pool. If that is the case, that means that a run of 
>X+1 should dedup completely with the run of X.  E.g. a run with 12 
>blocks will dedup the first 11 blocks with the prior run of 11.
>
>(2) can you NOT enable "verify" ?  Verify *requires* a disk read before 
>writing for any potential dedup-able block. If case #1 above applies, 
>then by turning on dedup, you *rapidly* increase the amount of disk I/O 
>you require on each subsequent run.  E.g. the run of 10 requires no 
>disk I/O due to verify, but the run of 11 requires 10 I/O 
>requests, while the run of 12 requires 11 requests, etc.  This 
>will skew your results as the ARC buffering of file info changes over time.
>
>(3) fflush is NOT the same as fsync.  If you're running the script in a 
>loop, it's entirely possible that ZFS hasn't completely committed things 
>to disk yet, which means that you get I/O requests to flush out the ARC 
>write buffer in the middle of your runs.   Honestly, I'd do the 
>following for benchmarking:
>
> i=0
> while [i -lt 80 ];
> do
> j = $[10 + ( 1  * 1)]
> ./run_your_script j
> sync
> sleep 10
> i = $[$i+1]
> done
>
>
>
>-- 
>Erik Trimble
>Java System Support
>Mailstop:  usca22-123
>Phone:  x17195
>Santa Clara, CA
>
>___
>zfs-discuss mailing list
>zfs-discuss@opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-08 Thread Garrett D'Amore

It is tunable, I don't remember the exact tunable name... Arc_metadata_limit or 
some such.

  -- Garrett D'Amore

On May 8, 2011, at 7:37 AM, "Edward Ned Harvey" 
 wrote:

>> From: Garrett D'Amore [mailto:garr...@nexenta.com]
>> 
>> Just another data point.  The ddt is considered metadata, and by default the
>> arc will not allow more than 1/4 of it to be used for metadata.   Are you 
>> still
>> sure it fits?
> 
> That's interesting.  Is it tunable?  That could certainly start to explain 
> why my arc size arcstats:c never grew to any size I thought seemed 
> reasonable...  And in fact it grew larger when I had dedup disabled.  Smaller 
> when dedup was enabled.  "Weird," I thought.
> 
> Seems like a really important factor to mention in this summary.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Extremely Slow ZFS Performance

2011-05-06 Thread Garrett D'Amore

Sounds like a nasty bug, and not one I've seen in illumos or
NexentaStor.  What build are you running?

- Garrett

On Wed, 2011-05-04 at 15:40 -0700, Adam Serediuk wrote:
> Dedup is disabled (confirmed to be.) Doing some digging it looks like
> this is a very similar issue
> to http://forums.oracle.com/forums/thread.jspa?threadID=2200577&tstart=0.
> 
> 
> 
> On May 4, 2011, at 2:26 PM, Garrett D'Amore wrote:
> 
> > My first thought is dedup... perhaps you've got dedup enabled and
> > the DDT no longer fits in RAM?  That would create a huge performance
> > cliff.
> > 
> > -Original Message-
> > From: zfs-discuss-boun...@opensolaris.org on behalf of Eric D.
> > Mudama
> > Sent: Wed 5/4/2011 12:55 PM
> > To: Adam Serediuk
> > Cc: zfs-discuss@opensolaris.org
> > Subject: Re: [zfs-discuss] Extremely Slow ZFS Performance
> > 
> > On Wed, May  4 at 12:21, Adam Serediuk wrote:
> > >Both iostat and zpool iostat show very little to zero load on the
> > devices even while blocking.
> > >
> > >Any suggestions on avenues of approach for troubleshooting?
> > 
> > is 'iostat -en' error free?
> > 
> > 
> > --
> > Eric D. Mudama
> > edmud...@bounceswoosh.org
> > 
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > 
> > 
> > 
> > 
> 
> 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Deduplication Memory Requirements

2011-05-05 Thread Garrett D'Amore

We have customers using dedup with lots of vm images... in one extreme case 
they are getting dedup ratios of over 200:1! 

You don't need dedup or sparse files for zero filling.  Simple zle compression 
will eliminate those for you far more efficiently and without needing massive 
amounts of ram.

Our customers have the ability to access our systems engineers to design the 
solution for their needs.  If you are serious about doing this stuff right, 
work with someone like Nexenta that can engineer a complete solution instead of 
trying to figure out which of us on this forum are quacks and which are cracks. 
 :)

Tim Cook  wrote:

>On Wed, May 4, 2011 at 10:23 PM, Edward Ned Harvey <
>opensolarisisdeadlongliveopensola...@nedharvey.com> wrote:
>
>> > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> > boun...@opensolaris.org] On Behalf Of Ray Van Dolson
>> >
>> > Are any of you out there using dedupe ZFS file systems to store VMware
>> > VMDK (or any VM tech. really)?  Curious what recordsize you use and
>> > what your hardware specs / experiences have been.
>>
>> Generally speaking, dedup doesn't work on VM images.  (Same is true for ZFS
>> or netapp or anything else.)  Because the VM images are all going to have
>> their own filesystems internally with whatever blocksize is relevant to the
>> guest OS.  If the virtual blocks in the VM don't align with the ZFS (or
>> whatever FS) host blocks...  Then even when you write duplicated data
>> inside
>> the guest, the host won't see it as a duplicated block.
>>
>> There are some situations where dedup may help on VM images...  For example
>> if you're not using sparse files and you have a zero-filed disk...  But in
>> that case, you should probably just use a sparse file instead...  Or ...
>>  If
>> you have a "golden" image that you're copying all over the place ... but in
>> that case, you should probably just use clones instead...
>>
>> Or if you're intimately familiar with both the guest & host filesystems,
>> and
>> you choose blocksizes carefully to make them align.  But that seems
>> complicated and likely to fail.
>>
>>
>>
>That's patently false.  VM images are the absolute best use-case for dedup
>outside of backup workloads.  I'm not sure who told you/where you got the
>idea that VM images are not ripe for dedup, but it's wrong.
>
>--Tim
>
>___
>zfs-discuss mailing list
>zfs-discuss@opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Deduplication Memory Requirements

2011-05-05 Thread Garrett D'Amore

On Thu, 2011-05-05 at 09:02 -0400, Edward Ned Harvey wrote:
> > From: Garrett D'Amore [mailto:garr...@nexenta.com]
> > 
> > We have customers using dedup with lots of vm images... in one extreme
> > case they are getting dedup ratios of over 200:1!
> 
> I assume you're talking about a situation where there is an initial VM image, 
> and then to clone the machine, the customers copy the VM, correct?
> If that is correct, have you considered ZFS cloning instead?

No.  Obviously if you can clone, its better.  But sometimes you can't do
this even with v12n, and we have this situation at customer sites today.
(I have always said, zfs clone is far easier, far more proven, and far
more efficient, *if* you can control the "ancestral" relationship to
take advantage of the clone.)  For example, one are where cloning can't
help is with patches and updates.  In some instances these can get quite
large, and across 1000's of VMs the space required can be considerable.

> 
> When I said dedup wasn't good for VM's, what I'm talking about is:  If there 
> is data inside the VM which is cloned...  For example if somebody logs into 
> the guest OS and then does a "cp" operation...  Then dedup of the host is 
> unlikely to be able to recognize that data as cloned data inside the virtual 
> disk.

I disagree.  I believe that within the VMDKs data is aligned nicely,
since these are disk images.

At any rate, we are seeing real (and large) dedup ratios in the field
when used with v12n.  In fact, this is the killer app for dedup.

> 
> > Our customers have the ability to access our systems engineers to design the
> > solution for their needs.  If you are serious about doing this stuff right, 
> > work
> > with someone like Nexenta that can engineer a complete solution instead of
> > trying to figure out which of us on this forum are quacks and which are
> > cracks.  :)
> 
> Is this a zfs discussion list, or a nexenta sales & promotion list?

My point here was that there is a lot of half baked advice being
given... the idea that you should only use dedup if you have a bunch of
zeros on your disk images is absolutely and totally nuts for example.
It doesn't match real world experience, and it doesn't match the theory
either.

And sometimes real-world experience trumps the theory.  I've been shown
on numerous occasions that ideas that I thought were half-baked turned
out to be very effective in the field, and vice versa.  (I'm a
developer, not a systems engineer.  Fortunately I have a very close
working relationship with a couple of awesome systems engineers.)

Folks come here looking for advice.  I think the advice that if you're
contemplating these kinds of solutions, you should get someone with some
real world experience solving these kinds of problems every day, is very
sound advice.  Trying to pull out the truths from the myths I see stated
here nearly every day is going to be difficult for the average reader
here, I think.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Going forward after Oracle - Let's get organized, let's get started.

2011-04-09 Thread Garrett D'Amore

On Sun, 2011-04-10 at 08:56 +1200, Ian Collins wrote:
> On 04/10/11 05:41 AM, Chris Forgeron wrote:
> > I see your point, but you also have to understand that sometimes too many 
> > helpers/opinions are a bad thing.  There is a set "core" of ZFS developers 
> > who make a lot of this move forward, and they are the key right now. The 
> > rest of us will just muddy the waters with conflicting/divergent opinions 
> > on direction and goals.
> >
> In the real world we would be called customers, you know the people who 
> actually use the product.

Right.  And in the real world, customers are generally not involved with
architectural discussions of products.  Their input is collected and
feed into the process, but they don't get to sit at the whiteboard with
developers as the work on the designs.

> 
> Developers, no matter how good, shouldn't work in a vacuum.

Agreed, and we don't.

> 
> If you want to see a good example of how things should be done in the 
> open, follow the caiman-discuss list.

Caiman-discuss may be an excellent example of a model that can work, but
it might not be the best model for ZFS.  There are many more contentious
issues, and more contentious personalities, and other considerations
that I don't want to get into.

Ultimately, our model is like an IEEE working group.  The members have
decided to run this list in this fashion, without any significant
dissension. 

Of course, if you don't like this, and want to start your own group, I
encourage you to do so.

I'll also point at zfs-discuss@opensolaris.org, which is monitored by a
number of the members of this cabal.  That's a great way to give
feedback.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Going forward after Oracle - Let's get organized, let's get started.

2011-03-26 Thread Garrett D'Amore

There is ZFS development happening outside of Oracle.  Many of the
active ZFS developers at a *variety* of organizations are collaborating
within the illumos community using a private e-mail list much like an
standards body Working Group (we even call ourselves the ZFS Working
Group).  And not all of the participants here are coming from Solaris
backgrounds -- we have Linux, FreeBSD, and MacOS represented.
(Privacy is important to keep the discussions focused and technical.
And some participants would prefer to keep their participation
non-public.)

Generally, the questions you raise are being worked on.  We'll have more
to say in the coming weeks.

In the meantime, if you're engaged in active ZFS development, or want to
be, please contact me off-list and I'll see about getting you added to
the private email list.

- Garrett

On Fri, 2011-03-25 at 16:17 -0300, Chris Forgeron wrote:
> I’m curious where ZFS development is going. 
> 
>  
> 
> I’ve been reading through the lists, and watching Oracle, Nexenta,
> Illumos, and OpenIndiana for signs of life.
> 
>  
> 
> The feeling I get is that while there is plenty of userland work being
> done, there is next to nothing on ZFS development outside of the
> Oracle camp. 
> 
>  
> 
> So, I decided to see if I could help set something in motion. 
> 
>  
> 
> Agreeing with some of the opinions expressed here,
> (http://opensolaris.org/jive/thread.jspa?messageID=508798񼍾 ) I
> contacted Erik Trimble and we had a very quick/brief discussion that
> we want to bring to the list so a more public and wide-scope
> discussion can happen. 
> 
>  
> 
> I have my ideas of where I’d like to see ZFS go, Erik doesn’t fully
> agree, and has other ideas as well. We both know that the rest of the
> community will have further ideas on what should be happening, and
> that’s why we’re discussing it here.
> 
>  
> 
> However, I think it’s imperative that we don’t fracture ZFS into 4
> different OpenSource versions that are all incompatible with
> each-other. 
> 
>  
> 
>  
> 
> I’d like to lay out some groundwork for this thread to keep it
> manageable;
> 
>  
> 
> 1)  This thread is about ZFS, not generic Solaris development,
> userland or non-ZFS development in _x_Solaris.  
> 
>  
> 
>  
> 
> In my mind, I see these issues as pressing and needing addressing;
> 
>  
> 
> 1)  Action. We need an official “we’re moving on” type of
> agreement, so we stop waiting for Oracle to do something. I don’t
> think we’re going to see v31 anytime soon, let’s stop waiting for it.
> What version do we branch from?
> 
>  
> 
> 2)  Home? ZFS requires a home. What is the home? Is it official?
> Will it always be able to live here? This home would also be in charge
> of version/change management. A new version system will need to be
> created to diff from the Oracle efforts going forward. 
> 
>  
> 
> 3)  Generic? Can this home be generic (i.e. lessen or remove the
> bias to Solaris)? Is it practical? FreeBSD has a very robust ZFS copy,
> running v28 in the beta builds. Pjd has done a lot of work porting
> this, and I know other people will do a lot of work porting ZFS in the
> future. Can we minimize the efforts of porting ZFS so the work can go
> into development and features, not constantly adapting changes to the
> OS in use? More than just Solaris developers want to contribute to
> ZFS.
> 
>  
> 
> 4)  Legal? Is there anything in ZFS that needs to be removed to
> ensure it has a long vibrant life in the Open Source community? Do we
> need a “Lite” version much like 4.4 BSD Lite to escape AT&T? This may
> need to spawn into a separate thread, as I’ve already seen a few epic
> threads about that topic alone. 
> 
>  
> 
>  
> 
> I think that’s plenty to get this started. 
> 
>  
> 
>  
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] best migration path from Solaris 10

2011-03-21 Thread Garrett D'Amore

On Mon, 2011-03-21 at 14:56 -0700, Paul B. Henson wrote:
> On 3/18/2011 3:15 PM, Garrett D'Amore wrote:

> 
> > c) NCP 4 is still 5-6 months away.  We're still developing it.
> 
> By the time I do some initial evaluation, then some prototyping, I don't
> anticipate migrating anything production wise until at the earliest
> Christmas break, so that timing shouldn't be a problem. Any thoughts on
> how soon a beta might be available? As it sounds like there will be
> significant changes, it might be better to evaluate with a beta of the
> new stuff rather than the production version of the older stuff. Plus I
> generally tend to break things in unexpected ways ;), so doing that in
> the beta cycle might be beneficial.

I *hate* talking about unreleased product schedules, but I think you can
expect a beta with a month or two, perhaps less.  We've already got an
alpha that we've handed out in limited quantities.

> 
> > d) NCP 4 will make much more use of the illumos userland, and only
> > use Debian when illumos doesn't have an equivalent.
> 
> Given both NCP and OpenIndiana will be based off of illumos, and as of
> version 4 NCP will be migrating as much as possible of the userland to
> solaris as opposed to gnu, other than the differing packaging formats
> what do you feel will distinguish NCP from openindiana? NCP is positioned as
> a bare-bones server, whereas openindiana is trying to be more general
> purpose including desktop use?

NCP is a core-technology thing.  Definitely not a general purpose OS at
all, and will be missing all the desktop stuff.

The idea behind NCP is that other distros build on top of, or people who
just want that bare bones OS use it.  It comes with debian packaging,
and we do have a bunch of the common server packages (Apache, etc.) set
up, but not everything that you might want.

> 
> > e) NCP comes entirely unsupported.  NexentaStor is a commercial
> > product with real support behind it, though.
> 
> Can you treat NexentaStor like a general purpose operating system, not
> use the management gui, and configure everything from a shell prompt, or
> is it more appliance like and you're locked out from the OS? In other
> words, would it be possible (although not necessarily cost-effective) to
> pay for NexentaStor for the support but treat it like NCP?

Once you dive under the controlled UI (which you can do), you basically
are breaking your support contract.

Going forward, NCP and NS will be more closely synchronized, so you'll
be able to get the same OS, and probably receive patches to it, that you
get with NS, albeit without official support and without the proprietary
add-on features like HA clustering, the management UI,
auto-tiering/auto-sync, etc.

> 
> Has your company considered basic support contracts for NCP? I've heard
> from at least one other site that might be interested in something like
> that. We don't need much in the way of handholding, the majority of our
> support calls end up being actual bugs or limitations in solaris. But if
> one of our file servers panics, doesn't import a pool when it boots, and
> crashes every time you try to import it by hand, it would be nice to
> have an engineer available :).

There have been some discussions, but figuring out how to make that
commercially worthwhile is challenging.  At some level, our engineers
are busy enough that we'd have to see enough commercial demand here to
justify adding engineers, because the number of calls we would take
would probably go significantly with such a change.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] best migration path from Solaris 10

2011-03-19 Thread Garrett D'Amore

Newer versions of FreeBSD have newer ZFS code.

That said, ZFS on FreeBSD is kind of a 2nd class citizen still.  FreeBSD
still gives equal (or higher) priority to ufs, and so some of the
changes in Solaris and derivatives (illumos) to make certain things like
NFS, CIFS, and COMSTAR/iSCSI work better with ZFS won't be present in
FreeBSD.

There are vendors who offer NexentaStor on hardware with full commercial
support from a single vendor (granted they get backline support from
Nexenta, but do you think ixSystems engineers personally fix bugs in
FreeBSD?)  Such vendors include PogoLinux and AreaData.

I've also started conversations with Pogo about offering an OpenIndiana
based workstation, which might be another option if you prefer more of a
general purpose solution.

- Garrett

On Sat, 2011-03-19 at 02:16 +0100, Roy Sigurd Karlsbakk wrote:
> > I think we all feel the same pain with Oracle's purchase of Sun.
> > 
> > FreeBSD that has commercial support for ZFS maybe?
> 
> Fbsd currently has a very old zpool version, not suitable for running with 
> SLOGs, since if you lose it, you may lose the pool, which isn't very 
> amusing...
> 
> Vennlige hilsener / Best regards
> 
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 97542685
> r...@karlsbakk.net
> http://blogg.karlsbakk.net/
> --
> I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
> er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
> idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
> relevante synonymer på norsk.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] best migration path from Solaris 10

2011-03-18 Thread Garrett D'Amore

Thanks for thinking about us, Paul.

A few quick thoughts:

a) Nexenta Core Platform is a bare-bones OS.  No GUI, in other words (no
X11.)  It might well suit you.

b) NCP 3 will not have an upgrade path to NCP 4.  Its simply too much
change in the underlying packaging.

c) NCP 4 is still 5-6 months away.  We're still developing it.

d) NCP 4 will make much more use of the illumos userland, and only use
Debian when illumos doesn't have an equivalent.

e) NCP comes entirely unsupported.  NexentaStor is a commercial product
with real support behind it, though.

f) *Today*, NexentaStor 3 has newer code in it than NCP.  That will be
changing, as we will be keeping the two much more closely in sync
starting with 3.1.

g) If you want to self support, OpenIndiana or NCP are both good
options.  NCP has debian packaging, and lacks a bunch of the GUI
goodies.  NCP 3 is not as new as OI, but is probably a bit more proven.

Hopefully the additional information is helpful to you.

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [illumos-Developer] ZFS spare disk usage issue

2011-03-04 Thread Garrett D'Amore

On Fri, 2011-03-04 at 18:03 +0100, Roy Sigurd Karlsbakk wrote:
> So should I post a bug, or is there one there already?
> 
> Btw, I can't reach http://bugs.illumos.org/ - it times out

Try again in a few minutes... the server just got rebooted.

- Garrett
> 
> roy
> 
> - Original Message -
> > We've talked about this, and I will be putting together a fix for this
> > incorrect state handling. :-)
> > 
> > - Garrett
> > 
> > On Fri, 2011-03-04 at 11:50 -0500, Eric Schrock wrote:
> > > This looks like a pretty simple bug. The issue is that the state of
> > > the SPARE vdev is being reported as REMOVED instead of DEGRADED. If
> > > it were the latter (as it should be), then everything would work
> > > just
> > > fine. Please file a bug at bugs.illumos.org.
> > >
> > >
> > > On a side note, this continues to expose the overly simplistic vdev
> > > state model used by ZFS (one which I can take a bulk of the
> > > responsibility for). Back before the days of ditto blocks and
> > > SPA3.0,
> > > it was sufficient to model state as a fairly binary proposition. But
> > > this now has ramifications that don't necessarily make sense. For
> > > example, one may be able open a pool even if a toplevel vdev is
> > > faulted. And even when a spare has finished resilvering, it's left
> > > in
> > > the DEGRADED state, which has implications for allocation policies
> > > (though I remember discussions around changing this). But the pool
> > > state is derived directly from the toplevel vdev state, so if you
> > > switch spares to be ONLINE, then 'zpool status' would think your
> > > pool
> > > is perfectly healthy. In this case it's true from a data protection
> > > standpoint, but not necessarily from a "all is well in the world"
> > > standpoint, as you are down one spare, and that spare may not have
> > > the
> > > same RAS properties as other devices in your RAID-Z stripe (it may
> > > put
> > > 3 disks on the same controller in one stripe, for example).
> > >
> > >
> > > - Eric
> > >
> > > On Fri, Mar 4, 2011 at 7:06 AM, Roy Sigurd Karlsbakk
> > >  wrote:
> > > Hi all
> > >
> > > I just did a small test on RAIDz2 to check whether my
> > > suspicion was right about ZFS not treating spares as
> > > replicas/copies of drives, and I think I've found it true.
> > > The
> > > short story: If two spares replaces two drives in raidz2,
> > > losing a third drive, even with the spares active, makes the
> > > pool unavailable. See full report on
> > >
> > > ODT: http://karlsbakk.net/ZFS/ZFS%20Spare%20disk%20usage.odt
> > > PDF: http://karlsbakk.net/ZFS/ZFS%20Spare%20disk%20usage.pdf
> > >
> > > Vennlige hilsener / Best regards
> > >
> > > roy
> > > --
> > > Roy Sigurd Karlsbakk
> > > (+47) 97542685
> > > r...@karlsbakk.net
> > > http://blogg.karlsbakk.net/
> > > --
> > > I all pedagogikk er det essensielt at pensum presenteres
> > > intelligibelt. Det er et elementært imperativ for alle
> > > pedagoger å unngå eksessiv anvendelse av idiomer med fremmed
> > > opprinnelse. I de fleste tilfeller eksisterer adekvate og
> > > relevante synonymer på norsk.
> > >
> > > ___
> > > Developer mailing list
> > > develo...@lists.illumos.org
> > > http://lists.illumos.org/m/listinfo/developer
> > >
> > >
> > >
> > > --
> > > Eric Schrock
> > > Delphix
> > >
> > >
> > > 275 Middlefield Road, Suite 50
> > > Menlo Park, CA 94025
> > > http://www.delphix.com
> > >
> > >
> > > ___
> > > Developer mailing list
> > > develo...@lists.illumos.org
> > > http://lists.illumos.org/m/listinfo/developer
> 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Format returning bogus controller info

2011-03-03 Thread Garrett D'Amore

On Thu, 2011-03-03 at 21:02 +0100, Roy Sigurd Karlsbakk wrote:
> > > Last I checked, it didn't help much. IMHO we need a driver that can
> > > display the drives in the order they're plugged in. Like Windoze.
> > > Like Linux. Like FreeBSD. I really don't understand what should be
> > > so hard to do it like the others. As one said "I don't have their
> > > sources", both Linux and FreeBSD are OSS software, so the source
> > > should be ready quite easuly.
> > 
> > The difference is in the license. While I could look at what a
> > BSD-licensed driver does, I do not want to look at a GPL-licensed
> > driver and create problems for myself or my employer.
> 
> It's no problem reusing methodics from a GPL driver in CDDL, but the code 
> must be rewritten. Also, copying trivial parts won't break the license, but 
> then, I guess the Solaris driver API is quite different from the one on 
> Linux. Also, perhaps the FreeBSD driver works better at this?
> 

What you're not realizing is that FreeBSD and Linux were designed with a
different set of assumptions.  The basic assumption there is that you
won't disturb the probe order, either by adding hardware, or by having
hardware that simply is non-deterministic in probing.

Solaris doesn't do that.  It enumerates the hardware the first time it
sees it, and then remembers the "physical" path to that disk as an
alias, usually both in /etc/path_to_instance and in the symbolic links
that populate /dev/.

The failure here is in your assumption that the Linux and *BSD behavior
is correct.  It is correct in a small system that doesn't have to
concern itself with fallout from changes in the probe order due to e.g.
dynamic reconfiguration, but in enterprise systems where devices and
buses can be hot swapped, it fails miserably.  The Solaris behavior is
simply better, once you learn to accept it.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Good SLOG devices?

2011-03-01 Thread Garrett D'Amore

The PCIe based ones are good (typically they are quite fast), but check
the following first:

a) do you need an SLOG at all?  Some workloads (asynchronous ones) will
never benefit from an SLOG.

b) form factor.  at least one manufacturer uses a PCIe card which is
not compliant with the PCIe form-factor and will not fit in many cases
-- especially typical 1U boxes.

c) driver support.

d) do they really just go straight to ram/flash, or do they have an
on-device SAS or SATA bus?  Some PCIe devices just stick a small flash
device on a SAS or SATA controller.  I suspect that those devices won't
see a lot of benefit relative to an external drive (although they could
theoretically drive that private SAS/SATA bus at much higher rates than
an external bus -- but I've not checked into it.)

The other thing with PCIe based devices is that they consume an IO slot,
which may be precious to you depending on your system board and other
I/O needs. 

- Garrett

On Tue, 2011-03-01 at 17:03 +0100, Roy Sigurd Karlsbakk wrote:
> Hi
> 
> I'm running OpenSolaris 148 on a few boxes, and newer boxes are getting 
> installed as we speak. What would you suggest for a good SLOG device? It 
> seems some new PCI-E-based ones are hitting the market, but will those 
> require special drivers? Cost is obviously alsoo an issue here
> 
> Vennlige hilsener / Best regards
> 
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 97542685
> r...@karlsbakk.net
> http://blogg.karlsbakk.net/
> --
> I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
> er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
> idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
> relevante synonymer på norsk.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SIL3114 and sparc solaris 10

2011-02-23 Thread Garrett D'Amore

On Wed, 2011-02-23 at 13:16 -0500, Mauricio Tavares wrote:
> hardware.
> >
> > I +1 the suggestion to find something more modern if at all possible.
> >
>  Oh, just lovely. What would you suggest instead? I mean, besides
> canning the machine altogether ;)

LSI have some adapters than can do both SAS and SATA, and would probably
get the job done.  But, I'll point out that you're talking about S10...
perhaps you should ask *Oracle* ?

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup success stories (take two)

2011-02-02 Thread Garrett D'Amore


On 01/31/11 04:48 PM, Roy Sigurd Karlsbakk wrote:

As I've said here on the list a few times earlier, the last on the
thread 'ZFS not usable (was ZFS Dedup question)', I've been doing some
rather thorough testing on zfs dedup, and as you can see from the
posts, it wasn't very satisfactory. The docs claim 1-2GB memory usage
per terabyte stored, ARC or L2ARC, but as you can read from the post,
I don't find this very likely.
 

Sorry about the initial post - it was wrong. The hardware configuration was 
right, but for initial tests, I use NFS, meaning sync writes. This obviously 
stresses the ARC/L2ARC more than async writes, but the result remains the same.

With 140GB with of L2ARC on two X25-Ms and some 4GB partitions on the same 
devices, 4GB each, in a mirror, the write speed was reduced to something like 
20% of the origian speed. This was with about 2TB used on the zpool with a 
single data stream, no parallelism whatsoever. Still with 8GB ARC and 140GB of 
L2ARC on two SSDs, this speed is fairly low. I could not see substantially high 
CPU or I/O load during this test.
   


I would not expect good performance on dedup with write... dedup isn't 
going to make write's fast - its something you want on a system with a 
lot of duplicated data that sustain a lot of reads.  (That said, highly 
duplicate date with a DDT that fits entirely in RAM might see a benefit 
from not having to write meta data frequently.  But I suspect an SLOG 
here is going to be critical to get good performance since you'll still 
have a lot of synchronous meta data writes.)


- Garrett

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS and TRIM

2011-02-01 Thread Garrett D'Amore


On 01/31/11 01:09 PM, Pasi Kärkkäinen wrote:

On Mon, Jan 31, 2011 at 03:41:52PM +0100, Joerg Schilling wrote:
   

Brandon High  wrote:

 

On Sat, Jan 29, 2011 at 8:31 AM, Edward Ned Harvey
  wrote:
   

What is the status of ZFS support for TRIM?
 

I believe it's been supported for a while now.
http://www.c0t0d0s0.org/archives/6792-SATA-TRIM-support-in-Opensolaris.html
   

The command is implemented in the sata driver but there does ot seem to be any
user of the code.

 

Btw is the SCSI equivalent also implemented? iirc it was called SCSI UNMAP (for 
SAS).
   


No.

- Garrett


-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS and L2ARC memory requirements?

2011-01-31 Thread Garrett D'Amore


On 01/31/11 06:40 PM, Roy Sigurd Karlsbakk wrote:

- Original Message -
   

Even *with* an L2ARC, your memory requirements are *substantial*,
because the L2ARC itself needs RAM. 8 GB is simply inadequate for
your
test.
   

With 50TB storage, and 1TB if L2ARC, with no dedup, what amount of ARC
would you would you recommeend?
 


First off... a *big* caveat.  I am *not* a tuning expert.  We have 
people in our company who can help you out from operational experience 
if you want to configure a system like this -- one of them -- Richard 
Elling -- is a frequently seen face around here.   That said, I'm going 
to respond from my *very* rough understanding of how these structures 
play together


So, I'd say:  Alot.  1TB for ARC sounds like a rather largeish amount.   
I don't know offhand the typical ratio of ARC -> L2ARC, but note that 
every entry in the L2ARC requires at least some book-keeping in the ARC 
(which is in RAM).  I've seen people say that you can have anywhere from 
10x RAM to 20x RAM for L2ARC.   It sounds like this means between 50GB 
and 100GB *just* for an L2ARC of this size.


That's without dedup.


And then, _with_ dedup, what would you recommemend?
 

make that 100TB of storage
   


With 100TB of storage, fully consumed, your DDT is going to need to be 
about 500GB (assuming 64K block size, which may or may not be a good 
average).


That whole DDT will fit into the L2ARC above, so you probably can get by 
with just the 50-100GB of RAM.  But I recommend allocating *more* than 
that, because you really don't want *every* write to the DDT to go to 
L2ARC, and you really *do* want to have some memory available for things 
besides just the ARC.


Generally, this feels like a 256GB memory configuration to me.

Fundamentally, the best way to reduce the memory impact is to use dedup 
much more sparingly, and configure a much smaller L2ARC.


You also need to analyze your workload to see if you'll benefit from 
having L2ARC apart from the DDT itself.


- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] VDI, ZFS and comstar

2011-01-30 Thread Garrett D'Amore

I'm not personally familiar with with VDI, but it feels like the VDI
bits are trying to run pkginfo on a NexentaStor target, which is a
syntax error.

I'm not sure what the fix for that would be.

- Garrett

On Sun, 2011-01-30 at 09:37 +, Thierry Delaitre wrote:
> Hello,
> 
> I’ve got VDI 3.2.1 and I’m experiencing ZFS iscsi persistence after
> rebooting the ZFS Solaris 10 (s9/10 s10x_u9wos_14a X86) server so I
> tried to use NexentaOS_134f as according 
> to http://sun.systemnews.com/articles/145/5/Virtualization/22991, VDI
> 3.1.1 supports COMSTAR
> 
> However, with nexenta, I’m getting the following message after
> selecting the ZFS pool when trying to add the nexenta ZFS server to
> VDI 3.2.1:
> 
> “A required package (SUNWiscsir) is missing on the storage server”
> 
> root@vdi02:~# pkginfo | grep iscsi
> base SUNWiscsiu   Sun iSCSI Management Utilities
> (usr)
> base SUNWiscsir   Sun iSCSI Device Driver (root)
> base SUNWiscsitr  Sun iSCSI COMSTAR Port Provider
> (root)
> base SUNWiscsitu  Sun iSCSI COMSTAR Port Provider
> base SUNWiscsidmr Sun iSCSI Data Mover (Root)
> base SUNWiscsidmu Sun iSCSI Data Mover (Usr)
> 
> root@vdi02:~# dpkg -l | grep iscsi
> ii  sunwiscsidmr  5.11.134-12
>   Sun iSCSI Data Mover (Root)
> ii  sunwiscsidmu  5.11.134-12
>   Sun iSCSI Data Mover (Usr)
> ii  sunwiscsir5.11.134-12
>   Sun iSCSI Device Driver (root)
> ii  sunwiscsitr   5.11.134-12a
>  Sun iSCSI COMSTAR Port Provider (root)
> ii  sunwiscsitu   5.11.134-12
>   Sun iSCSI COMSTAR Port Provider
> ii  sunwiscsiu5.11.134-12
>   Sun iSCSI Management Utilities (usr)
> 
> Am’I missing something ?
> 
> Cheers,
> 
> Thierry. 
>The University of Westminster is a charity and a company
>   limited by guarantee. Registration number: 977818 England.
> Registered Office: 309 Regent Street, London W1B 2UW.
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup success stories?

2011-01-30 Thread Garrett D'Amore

I'm not sure about *docs*, but my rough estimations:

Assume 1TB of actual used storage.  Assume 64K block/slab size.  (Not
sure how realistic that is -- it depends totally on your data set.)
Assume 300 bytes per DDT entry.

So we have (1024^4 / 65536) * 300 = 5033164800 or about 5GB RAM for one
TB of used disk space.

Dedup is *hungry* for RAM.  8GB is not enough for your configuration,
most likely!  First guess: double the RAM and then you might have better
luck.

The other takeaway here: dedup is the wrong technology for typical small
home server (e.g. systems that max out at 4 or even 8 GB).

Look into compression and snapshot clones as better alternatives to
reduce your disk space needs without incurring the huge RAM penalties
associated with dedup.

Dedup is *great* for a certain type of data set with configurations that
are extremely RAM heavy.  For everyone else, its almost universally the
wrong solution.  Ultimately, disk is usually cheaper than RAM -- think
hard before you enable dedup -- are you making the right trade off?

- Garrett

On Sun, 2011-01-30 at 22:53 +0100, Roy Sigurd Karlsbakk wrote:
> Hi all
> 
> As I've said here on the list a few times earlier, the last on the thread 
> 'ZFS not usable (was ZFS Dedup question)', I've been doing some rather 
> thorough testing on zfs dedup, and as you can see from the posts, it wasn't 
> very satisfactory. The docs claim 1-2GB memory usage per terabyte stored, ARC 
> or L2ARC, but as you can read from the post, I don't find this very likely.
> 
> So, is there anyone in here using dedup for large storage (2TB? 10TB? more?) 
> and can document sustained high performance?
> 
> The reason I ask, is if this is the case, something is badly wrong with my 
> test setup.
> 
> The test box is a supermicro thing with a Core2duo CPU, 8 gigs of RAM, 4 gigs 
> of mirrored SLOG and some 150 gigs of L2ARC on 80GB x25-M drives. The data 
> drives are 7 2TB drives in RAIDz2. We're getting down to 10-20MB/s on Bacula 
> backup to this system, meaning streaming, which should be good for RAIDz2. 
> Since the writes are local (bacula-sd running), async writes will be the main 
> thing. Initial results show pretty good I/O perfrmance, but after about 2TB 
> used, the I/O speed is down to the numbers I mentioned
> 
> PS: I know those drives aren't optimal for this, but the box is a year old or 
> so. Still, they should help out a bit.
> 
> Vennlige hilsener / Best regards
> 
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 97542685
> r...@karlsbakk.net
> http://blogg.karlsbakk.net/
> --
> I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
> er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
> idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
> relevante synonymer på norsk.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Changed ACL behavior in snv_151 ?

2011-01-27 Thread Garrett D'Amore

We are working on a change to illumos (and NexentaStor) to revive
acl_mode... lots and lots of people have had very bad experiences as a
result of that particular change.

- Garrett

On Thu, 2011-01-27 at 07:32 +, Ryan John wrote:
> > -Original Message-
> > From: Frank Lahm [mailto:frankl...@googlemail.com] 
> > Sent: 25 January 2011 14:50
> > To: Ryan John
> > Cc: zfs-discuss@opensolaris.org
> > Subject: Re: [zfs-discuss] Changed ACL behavior in snv_151 ?
> 
> > John,
> 
> > welcome onboard!
> 
> > 2011/1/25 Ryan  John :
> >> I’m sharing file systems using a smb and nfs, and since I’ve upgraded to
> >> snv_151, when I do a chmod from an NFS client, I lose all the NFSv4 ACLs.
> 
> > 
> 
> > I'd summarize as follows:
> > in order to play nice with Windows ACL semantics via builtin CIFS,
> > they choose the approach of throwing away ACLs on chmod(). Makes
> > Windows happy, others not so.
> 
> > -f
> Hi Frank,
> 
> This really breaks our whole setup.
> Under snv_134 our users were happy with Windows ACLs, and NFSv3 and NFSv4 
> Linux clients.
> They all worked very well together. The only problem we had with the deny 
> ACLs, was when using the MacOS "Finder"
> 
> I don't think there's a way we can tell our users not to do a chmod.
> 
> Was it a result of PSARC/2009/029 ? 
> http://arc.opensolaris.org/caselog/PSARC/2010/029/20100126_mark.shellenbaum
> If so, I think that was implemented around snv_137.
> This would also mean it's the same in Illumos.
> 
> Regards
> John
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A few questions

2011-01-08 Thread Garrett D'Amore


On 01/ 8/11 10:43 AM, Stephan Budach wrote:

Am 08.01.11 18:33, schrieb Edward Ned Harvey:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Garrett D'Amore

When you purchase NexentaStor from a top-tier Nexenta Hardware Partner,
you get a product that has been through a rigorous qualification 
process

How do I do this, exactly?  I am serious.  Before too long, I'm going to
need another server, and I would very seriously consider 
reprovisioning my
unstable Dell Solaris server to become a linux or some other stable 
machine.
The role it's currently fulfilling is the "backup" server, which 
basically

does nothing except "zfs receive" from the primary Sun solaris 10u9 file
server.  Since the role is just for backups, it's a perfect 
opportunity for
experimentation, hence the Dell hardware with solaris.  I'd be happy 
to put

some other configuration in there experimentally instead ... say ...
nexenta.  Assuming it will be just as good at "zfs receive" from the 
primary

server.

Is there some specific hardware configuration you guys sell?  Or 
recommend?

How about a Dell R510/R610/R710?  Buy the hardware separately and buy
NexentaStor as just a software product?  Or buy a somehow more certified
hardware&  software bundle together?

If I do encounter a bug, where the only known fact is that the system 
keeps

crashing intermittently on an approximately weekly basis, and there is
absolutely no clue what's wrong in hardware or software...  How do 
you guys

handle it?


Such problems are handled on a case by case basis.  Usually we can do 
some analysis from a crash dump, but not always.   My team includes 
several people who are experienced with such analysis, and when problems 
like this occur, we are called into action.


Ultimately this usually results in a patch, sometimes workaround 
suggestions, and sometimes even binary relief (which happens faster than 
a regular patch, but without the deeper QA.)


  - Garrett
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A few questions

2011-01-08 Thread Garrett D'Amore


On 01/ 6/11 05:28 AM, Edward Ned Harvey wrote:

From: Khushil Dep [mailto:khushil@gmail.com]

I've deployed large SAN's on both SuperMicro 825/826/846 and Dell
R610/R710's and I've not found any issues so far. I always make a point of
installing Intel chipset NIC's on the DELL's and disabling the Broadcom ones
but other than that it's always been plain sailing - hardware-wise anyway.
 

"not found any issues," "except the broadcom one which causes the system to crash 
regularly in the default factory configuration."

How did you learn about the broadcom issue for the first time?  I had to learn 
the hard way, and with all the involvement of both Dell and Oracle support 
teams, nobody could tell me what I needed to change.  We literally replaced 
every component of the server twice over a period of 1 year, and I spent 
mandays upgrading and downgrading firmwares randomly trying to find a stable 
configuration.  I scoured the internet to find this little tidbit about 
replacing the broadcom NIC, and randomly guessed, and replaced my nic with an 
intel card to make the problem go away.

The same system doesn't have a problem running RHEL/centos.

What will be the new problem in the next line of servers?  Why, during my 
internet scouring, did I find a lot of other reports, of people who needed to 
disable c-states (didn't work for me) and lots of false leads indicating 
firmware downgrade would fix my broadcom issue?

See my point?  Next time I buy a server, I do not have confidence to simply 
expect solaris on dell to work reliably.  The same goes for solaris 
derivatives, and all non-sun hardware.  There simply is not an adequate 
qualification and/or support process.
   


When you purchase NexentaStor from a top-tier Nexenta Hardware Partner, 
you get a product that has been through a rigorous qualification process 
which includes the hardware and software configuration matched together, 
tested with an extensive battery.  You also can get a higher level of 
support than is offered to people who build their own systems.


Oracle is *not* the only company capable of performing in depth testing 
of Solaris.


I can also know enough about problems that Oracle customers (or rather 
Sun customers) faced with Solaris on Sun hardware -- such as the 
terrible nvidia ethernet problems on first generation U20 and U40 
problems, or the marvell SATA problems on Thumper -- that I know that 
your picture of Oracle isn't nearly as rosy as you believe.  Of course, 
I also lived (as a Sun employee) through the UltraSPARC-II ECC fiasco...


  - Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A few questions

2011-01-05 Thread Garrett D'Amore


On 01/ 4/11 11:48 PM, Tim Cook wrote:



On Tue, Jan 4, 2011 at 8:21 PM, Garrett D'Amore <mailto:garr...@nexenta.com>> wrote:


On 01/ 4/11 09:15 PM, Tim Cook wrote:



On Mon, Jan 3, 2011 at 5:56 AM, Garrett D'Amore
mailto:garr...@nexenta.com>> wrote:

On 01/ 3/11 05:08 AM, Robert Milkowski wrote:

On 12/26/10 05:40 AM, Tim Cook wrote:



On Sat, Dec 25, 2010 at 11:23 PM, Richard Elling
mailto:richard.ell...@gmail.com>> wrote:


There are more people outside of Oracle developing for
ZFS than inside Oracle.
This has been true for some time now.




Pardon my skepticism, but where is the proof of this claim
(I'm quite certain you know I mean no disrespect)?
 Solaris11 Express was a massive leap in functionality and
bugfixes to ZFS.  I've seen exactly nothing out of "outside
of Oracle" in the time since it went closed.  We used to
see updates bi-weekly out of Sun.  Nexenta spending
hundreds of man-hours on a GUI and userland apps isn't work
on ZFS.




Exactly my observation as well. I haven't seen any ZFS
related development happening at Ilumos or Nexenta, at least
not yet.


Just because you've not seen it yet doesn't imply it isn't
happening.  Please be patient.

   - Garrett



Or, conversely, don't make claims of all this code contribution
prior to having anything to show for your claimed efforts.  Duke
Nukem Forever was going to be the greatest video game ever
created... we were told to "be patient"... we're still waiting
for that too.



Um, have you not been paying attention?  I've delivered quite a
lot of contribution to illumos already, just not in ZFS.   Take a
close look -- there almost certainly wouldn't *be* an open source
version of OS/Net had I not done the work to enable this in libc,
kernel crypto, and other bits.  This work is still higher priority
than ZFS innovation for a variety of reasons -- mostly because we
need a viable and supportable illumos upon which to build those
ZFS innovations.

That said, much of the ZFS work I hope to contribute to illumos
needs more baking, but some of it is already open source in
NexentaStor.  (You can for a start look at zfs-monitor, the WORM
support, and support for hardware GZIP acceleration all as things
that Nexenta has innovated in ZFS, and which are open source today
if not part of illumos.  Check out http://www.nexenta.org for
source code access.)

So there, money placed where mouth is.  You?

   - Garrett



The claim was that there are more people contributing code from 
outside of Oracle than inside to zfs.  Your contributions to Illumos 
do absolutely nothing to backup that claim.  ZFS-monitor is not ZFS 
code (it's an FMA module), WORM also isn't ZFS code, it's an OS level 
operation, and GZIP hardware acceleration is produced by Indra 
networks, and has absolutely nothing to do with ZFS.  Does it help 
ZFS?  Sure, but that's hardly a code contribution to ZFS when it's 
simply a hardware acceleration card that accelerates ALL gzip code.


Um... you have obviously not looked at the code.

Our WORM code is not some basic OS guarantees on top of ZFS, but 
modifications to the ZFS code itself so that ZFS *itself* honors the 
WORM property, which is implemented as a property on the ZFS filesystem.


Likewise, the GZIP hardware acceleration support includes specific 
modifications to the ZFS kernel filesystem code.


Of course, we've not done anything major to change the fundamental way 
that ZFS stores data... is that what you're talking about?


I think you must have a very narrow idea of what constitutes an 
"innovation" in ZFS.




So, great job picking three projects that are not proof of developers 
working on ZFS.  And great job not providing any proof to the claim 
there are more developers working on ZFS outside of Oracle than within.


Nexenta don't represent that majority actually.  A large number of ZFS 
folks -- people with names like Leventhal, Ahrens, Wilson, and Gregg, 
are working on ZFS related work at Delphix and Joyent, or so I've been 
told.  I don't have first hand knowledge of *what* the details are, but 
I'm looking forward to seeing the results.


This ignores the contributions from people working on ZFS on other 
platforms as well.


Of course, since I know longer work there, I don't really know how many 
people Oracle still has working on ZFS.  They could have tasked 1,000 
people with it.  Or they could have shut the project down entirely.  But 
of the people who had, up until Oracle shut down the open code, made 
non-trivial contributions to ZFS, I think the majority o

Re: [zfs-discuss] A few questions

2011-01-04 Thread Garrett D'Amore


On 01/ 3/11 05:08 AM, Robert Milkowski wrote:

On 12/26/10 05:40 AM, Tim Cook wrote:



On Sat, Dec 25, 2010 at 11:23 PM, Richard Elling 
mailto:richard.ell...@gmail.com>> wrote:



There are more people outside of Oracle developing for ZFS than
inside Oracle.
This has been true for some time now.




Pardon my skepticism, but where is the proof of this claim (I'm quite 
certain you know I mean no disrespect)?  Solaris11 Express was a 
massive leap in functionality and bugfixes to ZFS.  I've seen exactly 
nothing out of "outside of Oracle" in the time since it went closed. 
 We used to see updates bi-weekly out of Sun.  Nexenta spending 
hundreds of man-hours on a GUI and userland apps isn't work on ZFS.





Exactly my observation as well. I haven't seen any ZFS related 
development happening at Ilumos or Nexenta, at least not yet.


Just because you've not seen it yet doesn't imply it isn't happening.  
Please be patient.


   - Garrett



--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] stupid ZFS question - floating point operations

2010-12-24 Thread Garrett D'Amore

Thanks for the clarification.  I guess I need to go back and figure out how ZFS 
crypto keying is performed.  I guess most likely the key is generated from some 
sort of one-way hash from a passphrase?

  - Garrett

-Original Message-
From: Darren J Moffat [mailto:darren.mof...@oracle.com]
Sent: Thu 12/23/2010 1:32 AM
To: Garrett D'Amore
Cc: Erik Trimble; Jerry Kemp; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] stupid ZFS question - floating point operations

On 22/12/2010 20:27, Garrett D'Amore wrote:
> That said, some operations -- and cryptographic ones in particular --
> may use floating point registers and operations because for some
> architectures (sun4u rings a bell) this can make certain expensive

Well remembered!  There are sun4u optimisations that use the floating 
point unit but those only apply to the bignum code which in kernel is 
only used by RSA.

> operations go faster. I don't think this is the case for secure
> hash/message digest algorithms, but if you use ZFS encryption as found
> in Solaris 11 Express you might find that on certain systems these
> registers are used for performance reasons, either on the bulk crypto or
> on the keying operations. (More likely the latter, but my memory of
> these optimizations is still hazy.)

RSA isn't used at all by ZFS encryption, everything is AES (including 
key wrapping) and SHA256.

So those optimistations for floating point don't come into play for ZFS 
encryption.

-- 
Darren J Moffat

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Looking for 3.5" SSD for ZIL

2010-12-23 Thread Garrett D'Amore

We should get the reformatter(s) ported to illumos/solaris, if source is 
available.  Something to consider.

  - Garrett

-Original Message-
From: zfs-discuss-boun...@opensolaris.org on behalf of Erik Trimble
Sent: Wed 12/22/2010 10:36 PM
To: Christopher George
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Looking for 3.5" SSD for ZIL

On 12/22/2010 7:05 AM, Christopher George wrote:
>> I'm not sure if TRIM will work with ZFS.
> Neither ZFS nor the ZIL code in particular support TRIM.
>
>> I was concerned that with trim support the SSD life and
>> write throughput will get affected.
> Your concerns about sustainable write performance (IOPS)
> for a Flash based SSD are valid, the resulting degradation
> will vary depending on the controller used.
>
> Best regards,
>
> Christopher George
> Founder/CTO
> www.ddrdrive.com

Christopher is correct, in that SSDs will suffer from (non-trivial) 
performance degredation after they've exhausted their free list, and 
haven't been told to reclaim emptied space.  True battery-backed DRAM is 
the only permanent solution currently available which never runs into 
this problem.  Even TRIM-supported SSDs eventually need reconditioning.

However, this *can* be overcome by frequently re-formatting the SSD (not 
the Solaris format, a low-level format using a vendor-supplied 
utility).  It's generally a simple thing, but requires pulling the SSD 
from the server, connecting it to either a Linux or Windows box, running 
the reformatter, then replacing the SSD.  Which, is a PITA.

But, still a bit cheaper than buying a DDRdrive. 

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] stupid ZFS question - floating point operations

2010-12-22 Thread Garrett D'Amore

Generally, ZFS does not use floating point.

And further, use of floating point in the kernel is exceptionally rare.  The 
kernel does not save floating point context automatically, which means that 
code that uses floating point needs to take special care to make sure any 
context from userland is saved and restored before it can use the registers 
itself.   This rather onerous burden tends to exclude "easy" consumption of 
floating point.

That said, some operations -- and cryptographic ones in particular -- may use 
floating point registers and operations because for some architectures (sun4u 
rings a bell) this can make certain expensive operations go faster.  I don't 
think this is the case for secure hash/message digest algorithms, but if you 
use ZFS encryption as found in Solaris 11 Express you might find that on 
certain systems these registers are used for performance reasons, either on the 
bulk crypto or on the keying operations.  (More likely the latter, but my 
memory of these optimizations is still hazy.)

Note that *if* this is done, it is only done where such an operation is a 
performance win, and not because any of the math is inherently floating point.  
So in this case, I would say that this optimization would be an advantage, 
rather than a disadvantage.

Oh, and this usage only applies to Solaris, and is optional.  I doubt FreeBSD 
has these particular enhancements -- indeed, IIRC, these optimizations are 
specific to certain classes of SPARC cpus and probably are not performed at all 
for x86 cpus.

  - Garrett

-Original Message-
From: zfs-discuss-boun...@opensolaris.org on behalf of Erik Trimble
Sent: Wed 12/22/2010 12:08 PM
To: Jerry Kemp; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] stupid ZFS question - floating point operations

On 12/22/2010 11:49 AM, Tomas Ögren wrote:
> On 22 December, 2010 - Jerry Kemp sent me these 1,0K bytes:
>
>> I have a coworker, who's primary expertise is in another flavor of Unix.
>>
>> This coworker lists floating point operations as one of ZFS detriments.
>>
>> I's not really sure what he means specifically, or where he got this
>> reference from.
> Then maybe ask him first? Guilty until proven innocent isn't the regular
> path...
>
>> In an effort to refute what I believe is an error or misunderstanding on
>> his part, I have spent time on Yahoo, Google, the ZFS section of
>> OpenSolaris.org, etc.  I really haven't turned up much of anything that
>> would prove or disprove his comments.  The one thing I haven't done is
>> to go through the ZFS source code, but its been years since I have done
>> any serious programming.
>>
>> If someone from Oracle, or anyone on this mailing list could point me
>> towards any documentation, or give me a definitive word, I would sure
>> appreciate it.  If there were floating point operations going on within
>> ZFS, at this point I am uncertain as to what they would be.
>>
>> TIA for any comments,
>>
>> Jerry
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> /Tomas

So far as my understanding of the codebase goes (and, while I've read a 
significant portion, I'm not really an expert here):

Assuming he means that ZFS has a weakness of heavy floating-point 
calculation requirements (i.e using ZFS requires heavy FP usage), that's 
wrong.

Like all normal filesystems, the "ordinary" operations are all integer, 
load, and store.  The ordinary work of caching, block allocation, and 
fetching/writing is of course all integer-based. I can't imagine someone 
writing a filesystem which does such operations using floating point.

A quick grep through the main ZFS sources doesn't find anything of type 
"double" or "float".

I think he might be confused with what is happening on Checksums (which 
is still all Integer, but looks/sounds "expensive").  Yes, ZFS is 
considerably *more* compute intensive than other filesystems.  However, 
it's all Integer, and one of the base assumptions of ZFS is that modern 
systems have lots of excess CPU cycles around, so stealing 5% for use 
with ZFS won't impact performance much, and the added features of ZFS 
more than make up for any CPU cycles lost.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS ... open source moving forward?

2010-12-11 Thread Garrett D'Amore


We have ZFS version 28.  Whether we ever get another open source update of ZFS 
from *Oracle* is at this point doubtful.  However, I will point out that there 
are a lot of former Oracle engineers, including both inventors of ZFS and many 
of the people who have worked on it over the years, who are no longer part of 
Oracle.  A number of those people have committed to working on ZFS related 
projects outside of Oracle, and I think ZFS will continue to evolve on its own 
in the open.

We'll have more to say on the matter early next year, I think.

-Original Message-
From: zfs-discuss-boun...@opensolaris.org on behalf of Edward Ned Harvey
Sent: Fri 12/10/2010 5:31 AM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] ZFS ... open source moving forward?
 
It's been a while since I last heard anybody say anything about this.
What's the latest version of publicly released ZFS?  Has oracle made it
closed-source moving forward?

 

Nexenta ... openindiana ... etc ... Are they all screwed?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 64-bit vs 32-bit applications

2010-08-19 Thread Garrett D'Amore

On Fri, 2010-08-20 at 09:23 +1200, Ian Collins wrote:

> >>  
> > There is no common C++ ABI.  So you get into compatibility concerns
> > between code built with different compilers (like Studio vs. g++).
> > Fail.
> >
> 
> Which is why we have extern "C".  Just about any Solaris driver, library 
> or kernel module could be implemented in C++ behind the C compatibility 
> layer and no one would notice.

As long as they don't depend on other features, like the C++ standard
library.  Once they consume any external C++ interfaces, you're dead.
Because you can't mix g++ standard libraries with those from Studio.

-- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 64-bit vs 32-bit applications

2010-08-19 Thread Garrett D'Amore

On Thu, 2010-08-19 at 15:48 -0500, Bob Friesenhahn wrote:
> On Thu, 19 Aug 2010, Garrett D'Amore wrote:

> 
> Since 1996, all of my professional programming work (for products) has 
> been done in C++.  Most of my open source work has been done in C. 
> There should be little doubt that C++ is a much better 
> implementation/design language than C, however, it does suffer from 
> the interoperability concerns (and cross-compiler portability 
> concerns) that Garrett mentions.  I have not encountered issues with 
> excessive memory consumption or slow execution.

My most recent experience with excessive memory consumption came when
trying to take a certain chunk of C++ code from a third party vendor,
and insert it into a tiny kernel.  The code basically added encryption,
and should not have been huge.  But it doubled the memory footprint of
the project.  (The rest of the code was the Sun Ray firmware.)  It was
significantly challenging to prune enough space to make the code fit in
this embedded environment.  The problems largely come, I think, from the
standard C++ library.  I know that it is *possible* to write C++ code
that doesn't do this -- but then you're basically writing C code using
the C++ compiler. :-)

> 
> There are plenty of people who don't know how to program in C++ or 
> other object oriented languages.  Most of these people should not be 
> programming at all.

Yes.  The problem is that C++ makes it too easy, IMO, to write code that
is completely illegible even when they don't mean to.  C can do this to,
but at least for the most part the language is explicit.  Use the +
operator generally doesn't cause memory allocations to occur, for
example.

> 
> Zfs could have been implemented in C++, but it would not be as 
> friendly in a kernel which is already implemented in C.

Which includes all of the kernels which currently use it.

-- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 64-bit vs 32-bit applications

2010-08-19 Thread Garrett D'Amore

On Fri, 2010-08-20 at 03:26 +0700, "C. Bergström" wrote:
> Ian Collins wrote:
> > On 08/20/10 07:48 AM, Garrett D'Amore wrote:
> >> On Thu, 2010-08-19 at 20:14 +0100, Daniel Taylor wrote:
> >>   
> >>> On 19 Aug 2010, at 19:42, Garrett D'Amore wrote:
> >>>
> >>> Out of interest, what language do you recommend?
> >>>  
> >> Depends on the job -- I'm a huge fan of choosing the right tool for the
> >> job.  I just think C++ tries to be jack of all trades and winds up being
> >> master of none.
> >>
> >>
> > Drifting slightly back on topic, a lot of the ZFS code (and even more 
> > driver code) I've looked at would be cleaner in C++.  As long as a 
> > library has a C linkage public interface, there aren't any 
> > compatibility issues.  The rest is FUD.
> I believe his root concern is/was that libCrun is closed source and a 
> drop-in replacement won't be easily possible until the compiler switches 
> over to using the IA64 C++ ABI.  (Garrett please feel to correct me if 
> my assumption is wrong)

That is a major concern.  But the problem is also that the ABIs created
by different compilers vary.  You can't mix g++ and studio generated
code, for example.  That's not FUD, its technical fact. 

-- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 64-bit vs 32-bit applications

2010-08-19 Thread Garrett D'Amore

On Fri, 2010-08-20 at 07:58 +1200, Ian Collins wrote:
> On 08/20/10 07:48 AM, Garrett D'Amore wrote:
> > On Thu, 2010-08-19 at 20:14 +0100, Daniel Taylor wrote:
> >
> >> On 19 Aug 2010, at 19:42, Garrett D'Amore wrote:
> >>
> >> Out of interest, what language do you recommend?
> >>  
> > Depends on the job -- I'm a huge fan of choosing the right tool for the
> > job.  I just think C++ tries to be jack of all trades and winds up being
> > master of none.
> >
> >
> Drifting slightly back on topic, a lot of the ZFS code (and even more 
> driver code) I've looked at would be cleaner in C++.  As long as a 
> library has a C linkage public interface, there aren't any compatibility 
> issues.  The rest is FUD.

There is no common C++ ABI.  So you get into compatibility concerns
between code built with different compilers (like Studio vs. g++).
Fail.

There are many many things to dislike about C++ -- you *can* write good
clean code in C++, but almost none of the C++ code I've seen fits that
description.

The various side effects, and unexpected memory explosion that occurs
with the "favored" C++ constructs tends to make C++ completely
unsuitable for use in a kernel.

I still have the scars from when Linus tried to experiment with a Linux
kernel in C++. :-)  The effort was *very* short lived.  Granted C++ has
changed a lot since then, but I think the ways it has changed make it
even more unsuitable for kernel/embedded work.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 64-bit vs 32-bit applications

2010-08-19 Thread Garrett D'Amore

On Thu, 2010-08-19 at 20:14 +0100, Daniel Taylor wrote:
> On 19 Aug 2010, at 19:42, Garrett D'Amore wrote:
> 
> Out of interest, what language do you recommend?

Depends on the job -- I'm a huge fan of choosing the right tool for the
job.  I just think C++ tries to be jack of all trades and winds up being
master of none.

For the work I do, I mostly prefer C.  

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 64-bit vs 32-bit applications

2010-08-19 Thread Garrett D'Amore

On Thu, 2010-08-19 at 21:25 +1200, Ian Collins wrote:
> On 08/19/10 08:51 PM, Joerg Schilling wrote:
> > Ian Collins  wrote:
> >
> >
> >> A quick test with a C++ application I'm working with which does a lot of
> >> string and container manipulation shows it
> >> runs about 10% slower in 64 bit mode on AMD64 and about the same in 32
> >> or 64 bit on a core i7. Built with -fast.
> >>  
> > This may be a result of the way the libC you are using was compiled.
> >
> > Try to compare performance tests that only depend on code you did write by 
> > your
> > own.
> >
> >
> Most of the C++ standard library (at least the containers part I'm 
> using) is header only code, so it is mainly code I compile my self.
> 
> Not using libC is somewhat impractical in real world applications!

Not if the program isn't written in C++!

The binary compatibility problems (plus a million other reasons) of C++
make me strongly urge people not to choose C++ as the language for their
project unless they are forced to by other constraints.  (And then they
will have to live with the consequent problems.)

-- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-18 Thread Garrett D'Amore

All of this is entirely legal conjecture, by people who aren't lawyers,
for issues that have not been tested by court and are clearly subject to
interpretation.  Since it no longer is relevant to the topic of the
list, can we please either take the discussion offline, or agree to just
let the topic die (on the basis that there cannot be an authoritative
answer until there is some case law upon which to base it?)

- Garrett


On Wed, 2010-08-18 at 09:43 -0500, Bob Friesenhahn wrote:
> On Wed, 18 Aug 2010, Joerg Schilling wrote:
> >
> > Linus is right with his primary decision, but this also applies for static
> > linking. See Lawrence Rosen for more information, the GPL does not distinct
> > between static and dynamic linking.
> 
> GPLv2 does not address linking at all and only makes vague references 
> to the "program".  There is no insinuation that the program needs to 
> occupy a single address space or mention of address spaces at all. 
> The "program" could potentially be a composition of multiple 
> cooperating executables (e.g. like GCC) or multiple modules.  As you 
> say, everything depends on the definition of a "derived work".
> 
> If a shell script may be dependent on GNU 'cat', does that make the 
> shell script a "derived work"?  Note that GNU 'cat' could be replaced 
> with some other 'cat' since 'cat' has a well defined interface.  A 
> very similar situation exists for loadable modules which have well 
> defined interfaces (like 'cat').  Based on the argument used for 
> 'cat', the mere injection of a loadable module into an execution 
> environment which includes GPL components should not require that 
> module to be distributable under GPL.  The module only needs to be 
> distributable under GPL if it was developed in such a way that it 
> specifically depends on GPL components.
> 
> Bob


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris startup script location

2010-08-18 Thread Garrett D'Amore

On Wed, 2010-08-18 at 01:20 -0700, Alxen4 wrote:
> Thanks...Now I think I understand...
> 
> Let me summarize it andd let me know if I'm wrong.
> 
> Disabling ZIL converts all synchronous calls to asynchronous which makes ZSF 
> to report data acknowledgment before it actually was written to stable 
> storage which in turn improves performance but might cause data corruption in 
> case of server crash.
> 
> Is it correct ?
> 
> In my case I'm having serious performance issues with NFS over ZFS.
> My NFS Client is ESXi so the major question is there risk of corruption for 
> VMware images if I disable ZIL ?

Yes.  If your server crashes, you can lose data.

- Garrett

> 
> 
> Thanks.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris startup script location

2010-08-18 Thread Garrett D'Amore

On Wed, 2010-08-18 at 00:49 -0700, Alxen4 wrote:
> Any argumentation why ?


Because a RAMDISK defeats the purpose of a ZIL, which is to provide a
fast *stable storage* for data being written.  If you are using a
RAMDISK, you are not getting any non-volatility guarantees that the ZIL
is supposed to offer.  You may as well run without one.  (Which will go
fast, but at the expense of data integrity.)

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris startup script location

2010-08-18 Thread Garrett D'Amore

On Wed, 2010-08-18 at 00:16 -0700, Alxen4 wrote:
> Is there any way run start-up script before non-root pool is mounted ?
> 
> For example I'm trying to use ramdisk as ZIL device (ramdiskadm )
> So I need to create ramdisk before actual pool is mounted otherwise it 
> complains that log device is missing :)
> 
> For sure I can manually remove/and add it  by script and put the script in 
> regular rc2.d location...I'm just looking for more elegant way to it.
> 
> 
> Thanks a lot.


You *really* don't want to use a ramdisk as your ZIL.  You'd be better
off just disabling the zil altogether.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-17 Thread Garrett D'Amore


Oh, as an insmod, I think the question is quite cloudy indeed, since you
get into questions about what forms a derivative product.

I was looking at the original statement of the two licenses running
together in the same program far too simply  of course when
considered with dynamic link (which insmod may be considered to be a
form of), the boundaries of what is the program, and what is a
derivative work are very murky. 

Unfortunately, AFAIK, the boundaries have never been tested.  I think
asking a non-technical court to judge the differences between static,
dynamic, and insmod style linking is probably going to be difficult.

- Garrett


On Tue, 2010-08-17 at 17:07 -0400, Miles Nordin wrote:
> >>>>> "gd" == Garrett D'Amore  writes:
> 
>  >> Joerg is correct that CDDL code can legally live right
>  >> alongside the GPLv2 kernel code and run in the same program.
> 
> gd> My understanding is that no, this is not possible.
> 
> GPLv2 and CDDL are incompatible:
> 
>  
> http://www.fsf.org/licensing/education/licenses/index_html/#GPLIncompatibleLicenses
> 
> however Linus's ``interpretation'' of the GPL considers that 'insmod'
> is ``mere aggregation'' and not ``linking'', but subject to rules of
> ``bad taste''.  Although this may sound ridiculous, there are blob
> drivers for wireless chips, video cards, and storage controllers
> relying on this ``interpretation'' for over a decade.  I think a ZFS
> porting project could do the same and end up emitting the same warning
> about a ``tained'' kernel that proprietary modules do:
> 
>  http://lwn.net/Articles/147070/
> 
> the quickest link I found of Linus actually speaking about his
> ``interpretation'', his thoughts are IMHO completely muddled (which
> might be intentional):
> 
>  http://lkml.org/lkml/2003/12/3/228
> 
> thus ultimately I think the question of whether it's legal or not
> isn't very interesting compared to ``is it moral?'' (what some of us
> might care about), and ``is it likely to survive long enough and not
> blow back in your face fiercely enough that it's a good enough
> business case to get funded somehow?'' (the question all the hardware
> manufacturers shipping blob drivers presumably asked themselves)
> 
> My own view on blob modules is: 
> 
>  * that it's immoral, and that Linus is both taking the wrong position
>and doing it without authority.  Even if his position is
>``everyone, please let's not fight,'' in practice that is a strong
>position favouring GPL violation, and his squirrelyness may look
>like taking a soft view but in practice it throws so much sand into
>the debate it ends up being actually a much stronger position than
>saying outright, ``I think insmod is mere aggregation.''  My
>copyright shouldn't have to bow to your celebrity.
> 
>  * and secondly that it does make business sense and is unlikely to
>cause any problems, because no one is able to challenge his
>authority.
> 
> Whatever is the view on binary blob modules, I think it's the same
> view on ZFS w.r.t. the law, but not necessarily the same view
> w.r.t. morality or business, because the copyright law itself is
> immoral according to the views of many and the business risk depends
> on how much you piss people off.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-17 Thread Garrett D'Amore

On Tue, 2010-08-17 at 14:04 -0500, Bob Friesenhahn wrote:
> On Tue, 17 Aug 2010, Ross Walker wrote:
> >
> > And there lies the problem, you need the agreement of all copyright 
> > holders in a GPL project to change it's licensing terms and some 
> > just will not budge.
> 
> Joerg is correct that CDDL code can legally live right alongside the 
> GPLv2 kernel code and run in the same program.  It is a "mind set" 
> issue with the Linux developers rather than a legal one.

My understanding is that no, this is not possible.  IANAL, but I think
the provisions of CDDL with respect to granting patent license and
choice of law venue are incompatible with GPL's stipulations.
Conventional wisdom and detailed analysis done by lawyers is that you
can't mix and match these licenses this way.

> 
> If ZFS was not tied to a big greedy controlling company then the Linux 
> kernel developers would be more likely to change their mind.

No, its a license problem too.  There may be NIH factors and reasonable
engineering disagreements too.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 64-bit vs 32-bit applications

2010-08-16 Thread Garrett D'Amore

It can be as simple as impact on the cache.  64-bit programs tend to be
bigger, and so they have a worse effect on the i-cache.

Unless your program does something that can inherently benefit from
64-bit registers, or can take advantage of the richer instruction set
that is available to amd64 programs, you probably will see a degradation
when running 64-bit programs.

That said, I think a great number of programs *do* benefit from the
larger registers, and from the richer ISA available to 64-bit programs.

- Garrett

On Mon, 2010-08-16 at 18:58 -0700, Kishore Kumar Pusukuri wrote:
> Hi,
> I am surprised with the performances of some 64-bit multi-threaded 
> applications on my AMD Opteron machine. For most of the applications, the 
> performance of 32-bit version is almost same as the performance of 64-bit 
> version. However, for a couple of applications, 32-bit versions provide 
> better performance (running-time is around 76 secs) than 64-bit (running time 
> is around 96 secs). Could anyone help me to find the reason behind this, 
> please?
> 
> 
> $ldd program-64  (64-bit version)
> libpthread.so.1 =>   /lib/64/libpthread.so.1
> libstdc++.so.6 =>/usr/lib/64/libstdc++.so.6
> libm.so.2 => /lib/64/libm.so.2
> libgcc_s.so.1 => /usr/lib/64/libgcc_s.so.1
> libc.so.1 => /lib/64/libc.so.1
> 
> $ ldd program-32 (32-bit version)
> libpthread.so.1 =>   /lib/libpthread.so.1
> libstdc++.so.6 =>/usr/lib/libstdc++.so.6
> libm.so.2 => /lib/libm.so.2
> libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
> libc.so.1 => /lib/libc.so.1

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-16 Thread Garrett D'Amore



> 
> see, that's good, and is a realistic future scenario for ZFS, AFAICT:
> there can be a branch that's safe to collaborate on, which cannot go
> into Solaris 11 and cannot be taken proprietary by Nexenta, either.

In fact, we are in the process of creating a non-profit foundation for
Illumos which can receive copyright assignment, and which will have a
board that will not be dominated by any one company, and a set of rules
which will guarantee that the code is not dependent on the good will or
good behavior of any company or even group of companies.

In fact, Nexenta is *strongly* in favor of this kind of organization, so
while the funding for it comes from Nexenta (mostly), Nexenta will not
have any controlling influence.

It takes time to set this stuff up, so please be patient.

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-16 Thread Garrett D'Amore

On Mon, 2010-08-16 at 08:52 -0700, Ray Van Dolson wrote:
> On Mon, Aug 16, 2010 at 08:48:31AM -0700, Joerg Schilling wrote:
> > Ray Van Dolson  wrote:
> > 
> > > > I absolutely guarantee Oracle can and likely already has
> > > > dual-licensed BTRFS.
> > >
> > > Well, Oracle obviously would want btrfs to stay as part of the Linux
> > > kernel rather than die a death of anonymity outside of it... 
> > >
> > > As such, they'll need to continue to comply with GPLv2 requirements.
> > 
> > No, there is definitely no need for Oracle to comply with the GPL as they
> > own the code.
> > 
> 
> Maybe there's not legally, but practically there is.  If they're not
> GPL compliant, why would Linus or his lieutenants continue to allow the
> code to remain part of the Linux kernel?
> 
> And what purpose would btrfs serve Oracle outside of the Linux kernel?

If they wanted to port it to Solaris under a difference license, they
could.  This may actually be a backup plan in case the NetApp suit goes
badly.  But this is pure conjecture.

- Garrett

> 
> Ray
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-15 Thread Garrett D'Amore

Any code can become abandonware; where it effectively bitrots into
oblivion.

For either ZFS or BTRFS (or any other filesystem) to survive, there have
to be sufficiently skilled developers with an interest in developing and
maintaining it (whether the interest is commercial or recreational).

Honestly, I think both ZFS and btrfs will continue to be invested in by
Oracle.

(The only way I could see this changing would be if there was a sudden
license change which would permit either ZFS to overtake btrfs in the
Linux kernel, or permit btrfs to overtake zfs in the Solaris kernel.  I
think from a technical perspective, the latter of those two is
exceedingly unlikely -- if I understand correctly btrfs has a lot of
ground to make up to catch zfs, and zfs continues to receive
improvements and innovation.  The only way I could see zfs being
abandoned would be if there were some legal reason why Oracle couldn't
continue to develop it.  I don't think that is in the cards, honestly.)

- Garrett

On Sun, 2010-08-15 at 19:33 -0400, Edward Ned Harvey wrote:
> > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > boun...@opensolaris.org] On Behalf Of Jerome Warnier
> > 
> > Do not forget Btrfs is mainly developed by ... Oracle. Will it survive
> > better than Free Solaris/ZFS?
> 
> It's gpl.  Just as zfs is cddl.  They cannot undo, or revoke the free
> license they've granted to use and develop upon whatever they've released.
> 
> ZFS is not dead, although it is yet to be seen if future development will be
> closed source.
> 
> BTRFS is not dead, and cannot be any more dead than zfs.
> 
> So honestly ... your comment above ... really has no bearing in reality.
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS diaspora (was Opensolaris is apparently dead)

2010-08-15 Thread Garrett D'Amore

the latest pkg-gate code in place. Still, even
> > if I get the latest code to install, it's not viable for the long
> > term unless I'm willing to live with stasis. 
> > 
> > 4. FreeBSD. I could live with it if I had to, but I'm not fond of
> > its packaging system; the last time I tried it I couldn't get the
> > package tools to pull a quick binary update. Even IPS works better.
> > I could go to the ports tree instead, but if I wanted to spend my
> > time recompiling everything, I'd run Gentoo instead. 
> > 
> > 5. Linux/FUSE. It works, but it's slow. 
> > 5a. Compile-it-yourself ZFS kernel module for Linux. This would be a
> > hassle (though DKMS would make it less of an issue), but usable -
> > except that the current module only supports zvols, so it's not
> > ready yet, unless I wanted to run ext3-on-zvol. Neither of these
> > solutions are practical for booting from ZFS. 
> > 
> > 6. Abandon ZFS completely and go back to LVM/MD-RAID. I ran it for
> > years before switching to ZFS, and it works - but it's a bitter pill
> > to swallow after drinking the ZFS Kool-Aid.
> 
> 7.) Linux/BTRFS.  Still green, but moving quickly.  It will have
> crossed a minimum usability and stability threshold when Ubuntu or
> Fedora is willing to support it as default.  Might happen with Ubuntu
> 11.04, although in mid-May there was talk that 10.10 had a slight
> chance as well (but that seems unlikely now).
> 
> 8.) EON NAS or other OpenSolaris based distros.  They don't seem to
> have a bright future in store as they're derivatives of OpenSolaris,
> unless they are able to transition to being based on IllumOS (which is
> conditional on how IllumOS progresses.)  On the other hand, it may not
> matter much if there aren't more updates to them as long as they work
> well enough in their current form for NAS type applications.  I.e.
> they're used until the next solution is available, like some are still
> using OpenSolaris 2009.06 instead of one of the development releases.
> 
> 
> 
> In another thread about a month ago Garrett D'Amore (from Nexenta and
> working with the IllumOS project which Nexenta is a sponsor of)
> wrote: 
> > There is another piece I'll add: even if Oracle were to stop
> > releasing
> > ZFS or OpenSolaris source code, there are enough of us with a vested
> > interest (commercial!) in its future that we would continue to develop
> > it outside of Oracle.  It won't just go stagnant and die.  I believe I
> > can safely say that Nexenta is committed to the continued development
> > and enhancement of this code base -- and to doing so in the open.
> >   
> 
> 
> From
> http://blogs.nexenta.org/blog/2010/08/13/opensolaris-no-more-and-nexenta/
> > It appears that the rumors may be true and that Oracle may have
> > decided to move towards a more closed model for the development of
> > Solaris.  You can see a blog post with the leaked internal memo
> > here:
> > 
> > http://sstallion.blogspot.com/2010/08/opensolaris-is-dead.html
> > 
> > If so, what does this mean for Nexenta?
> > 
> > Well, for NexentaStor customers and partners nothing will change.
> >  We’ve been planning for this contingency for a long time.  Clearly
> > we’ll have to fork.  Thanks in part to the take off of Illumos we’ll
> > be able to continue to our core development in the open, and we’ll
> > continue to contribute back all fixes such as the fairly recent ZFS
> > Monitor to the community.
> > 
> > The leaked memo does state that Oracle will open source the CDDL
> > components — however they’ll only do so when they release their
> > Solaris commercial releases and not before.
> > 
> > We are already seeing hundreds of new customers that are experienced
> > with OpenSolaris for storage – and we welcome these customers with
> > open arms.  NexentaStor can address your storage needs better than
> > OpenSolaris ever could, and we look forward to proving this to you
> > every day.
> > 
> > We also hereby make more explicit our support for Illumos.  In
> > addition to being a key contributor of engineering resources we are
> > happy to announce that we are going to contribute 1% of the equity
> > of Nexenta Systems to the forthcoming Illumos foundation.  I’m
> > confident that this 1% will be worth millions to the Illumos
> > foundation.   We would suggest that other companies consider a
> > similar approach.   We were planning to announce this when the
> > Illumos foundation was announced but given today’s rumors, I think
> > it i

Re: [zfs-discuss] ZFS development moving behind closed doors

2010-08-15 Thread Garrett D'Amore

On Sun, 2010-08-15 at 07:38 -0700, Richard Jahnel wrote:
> FWIW I'm making a significant bet that Nexenta plus Illumos will be the 
> future for the space in which I operate.
> 
> I had already begun the process of migrating my 134 boxes over to Nexenta 
> before Oracle's cunning plans became known. This just reaffirms my decision.


It warms my heart to hear you say that. :-)  After all, I made a similar
bet with my career. :-)

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-14 Thread Garrett D'Amore


On 08/14/10 03:32 PM, Mark Bennett wrote:

That's a very good question actually. I would think that COMSTAR would
stay because its used by the Fishworks appliance... however, COMSTAR is
a competitive advantage for DIY storage solutions. Maybe they will rip
it out of S11 and make it an add-on or something. That would suck.
 


   

I guess the only real reason you can't yank COMSTAR is because its now
the basis for iSCSI Target support. But again, there is nothing saying
that Target support has to be part of the standard OS offering.
 
   

Scary to think about. :)
 
   

benr.
 

That would be the sensible commercial decision, and kill off the competition in 
the storage market using OpenSolaris based product.
   


No, it wouldn't.  We (Nexenta) are probably the biggest player here.  If 
Oracle yanks the code, we'll keep a copy ourselves.  Indeed, we are in 
the process of some enhancements to this code which will make it into 
Illumos, but probably not into Oracle Solaris unless they pull from 
Illumos. :-)



I haven't found a linux that can reliably spin the 100Tb I currently have 
behind OpenSolaris and ZFS.
Luckily b134 doesn't seem to have any major issues, and I'm currently looking 
into a USB boot/raidz root combination for 1U storage.

I ran Red Hat 9 with updated packages for quite a few years.
As long as the kernel is stable, and you can work through the hurdles, it can 
still do the job.

   


Sure.

- Garrett


Mark.
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-14 Thread Garrett D'Amore


On 08/14/10 09:36 AM, Paul B. Henson wrote:

On Fri, 13 Aug 2010, Tim Cook wrote:

   

http://www.theregister.co.uk/2010/08/13/opensolaris_is_dead/
 

"Oracle will spend *more* money on OpenSolaris development than Sun did."

At least, as a Sun customer, that's the line they were trying to feed me
during the buy out.

Why exactly would I want to do business with a company that lies to its
customers?

   


They've *never* said "OpenSolaris" in this context.  The quote was for 
"Solaris".


Oracle *will* spend more on Solaris than Sun did.  I believe that.

The question is whether they will get as much for their development 
dollar as Sun did.  With the brain drain happening (I know things I 
can't say, but I was one of the parties to leave a couple of months 
ago), I think that it will cost Oracle more money to keep Solaris 
development active than it did Sun.  Of course, they won't be "wasting" 
money on things like community collaboration, open ARC review, etc...


-- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-13 Thread Garrett D'Amore


On 08/13/10 09:02 PM, "C. Bergström" wrote:

Erast wrote:



On 08/13/2010 01:39 PM, Tim Cook wrote:

http://www.theregister.co.uk/2010/08/13/opensolaris_is_dead/

I'm a bit surprised at this development... Oracle really just doesn't
get it.  The part that's most disturbing to me is the fact they 
won't be

releasing nightly snapshots.  It appears they've stopped Illumos in its
tracks before it really even got started (perhaps that explains the
timing of this press release)


Wrong. Be patient, with the pace of current Illumos development it 
soon will have all the closed binaries liberated and ready to sync up 
with promised ON code drops as dictated by GPL and CDDL licenses.
Illumos is just a source tree at this point.  You're delusional, 
misinformed, or have some big wonderful secret if you believe you have 
all the bases covered for a pure open source distribution though..


What's closed binaries liberated really mean to you?

Does it mean
   a. You copy over the binary libCrun and continue to use some 
version of Sun Studio to build onnv-gate
   b. You debug the problems with and start to use ancient gcc-3 (at 
the probably expense of performance regressions which most people 
would find unacceptable)

   c. Your definition is narrow and has missed some closed binaries


I think it's great people are still hopeful, working hard and going to 
steward this forward, but I wonder.. What pace are you referring to?  
The last commit to illumos-gate was 6 days ago and you're already not 
even keeping it in sync..  Can you even build it yet and if so where's 
the binaries?



I was on vacation.  Give me a break.  There will be lots more in the 
coming week.


- Garrett



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] NFS performance?

2010-07-25 Thread Garrett D'Amore

On Sun, 2010-07-25 at 21:39 -0500, Mike Gerdts wrote:
> On Sun, Jul 25, 2010 at 8:50 PM, Garrett D'Amore  wrote:
> > On Sun, 2010-07-25 at 17:53 -0400, Saxon, Will wrote:
> >>
> >> I think there may be very good reason to use iSCSI, if you're limited
> >> to gigabit but need to be able to handle higher throughput for a
> >> single client. I may be wrong, but I believe iSCSI to/from a single
> >> initiator can take advantage of multiple links in an active-active
> >> multipath scenario whereas NFS is only going to be able to take
> >> advantage of 1 link (at least until pNFS).
> >
> > There are other ways to get multiple paths.  First off, there is IP
> > multipathing. which offers some of this at the IP layer.  There is also
> > 802.3ad link aggregation (trunking).  So you can still get high
> > performance beyond  single link with NFS.  (It works with iSCSI too,
> > btw.)
> 
> With both IPMP and link aggregation, each TCP session will go over the
> same wire.  There is no guarantee that load will be evenly balanced
> between links when there are multiple TCP sessions.  As such, any
> scalability you get using these configurations will be dependent on
> having a complex enough workload, wise cconfiguration choices, and and
> a bit of luck.

If you're really that concerned, you could use UDP instead of TCP.  But
that may have other detrimental performance impacts, I'm not sure how
bad they would be in a data center with generally lossless ethernet
links.

Btw, I am not certain that the multiple initiator support (mpxio) is
necessarily any better as far as guaranteed performance/balancing.  (It
may be; I've not looked closely enough at it.)

I should look more closely at NFS as well -- if multiple applications on
the same client are access the same filesystem, do they use a single
common TCP session, or can they each have separate instances open?
Again, I'm not sure.

> 
> Note that with Sun Trunking there was an option to load balance using
> a round robin hashing algorithm.  When pushing high network loads this
> may cause performance problems with reassembly.

Yes.  Reassembly is Evil for TCP performance.

Btw, the iSCSI balancing act that was described does seem a bit
contrived -- a single initiator and a COMSTAR server, both client *and
server* with multiple ethernet links instead of a single 10GbE link.

I'm not saying it doesn't happen, but I think it happens infrequently
enough that its reasonable that this scenario wasn't one that popped
immediately into my head. :-)

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] NFS performance?

2010-07-25 Thread Garrett D'Amore

On Sun, 2010-07-25 at 17:53 -0400, Saxon, Will wrote:
> 
> I think there may be very good reason to use iSCSI, if you're limited
> to gigabit but need to be able to handle higher throughput for a
> single client. I may be wrong, but I believe iSCSI to/from a single
> initiator can take advantage of multiple links in an active-active
> multipath scenario whereas NFS is only going to be able to take
> advantage of 1 link (at least until pNFS). 

There are other ways to get multiple paths.  First off, there is IP
multipathing. which offers some of this at the IP layer.  There is also
802.3ad link aggregation (trunking).  So you can still get high
performance beyond  single link with NFS.  (It works with iSCSI too,
btw.)

-- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] NFS performance?

2010-07-24 Thread Garrett D'Amore

On Sat, 2010-07-24 at 19:54 -0400, Edward Ned Harvey wrote:
> > From: Garrett D'Amore [mailto:garr...@nexenta.com]
> > 
> > Fundamentally, my recommendation is to choose NFS if your clients can
> > use it.  You'll get a lot of potential advantages in the NFS/zfs
> > integration, so better performance.  Plus you can serve multiple
> > clients, etc.
> > 
> > The only reason to use iSCSI is when you don't have a choice, IMO.  You
> > should only use iSCSI with a single initiator at any point in time
> > unless you have some higher level contention management in place.
> 
> So ... You don't think filesystems like gfs etc, should ever be used?

"gfs" provides such higher level contention management.  I can't speak
for it myself, but my gut reaction is that unless you have a need for
the features of gfs, you are probably better served by NFS.

Running a more traditional filesystem (that does not allow concurrent
block device access) is almost certainly a bad idea unless you have
special needs.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Garrett D'Amore

Fundamentally, my recommendation is to choose NFS if your clients can
use it.  You'll get a lot of potential advantages in the NFS/zfs
integration, so better performance.  Plus you can serve multiple
clients, etc.

The only reason to use iSCSI is when you don't have a choice, IMO.  You
should only use iSCSI with a single initiator at any point in time
unless you have some higher level contention management in place.

- Garrett


On Fri, 2010-07-23 at 22:20 -0400, Edward Ned Harvey wrote:
> > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > boun...@opensolaris.org] On Behalf Of Linder, Doug
> > 
> > On a related note - all other things being equal, is there any reason
> > to choose NFS over ISCI, or vice-versa?  I'm currently looking at this
> 
> iscsi and NFS are completely different technologies.  If you use iscsi, then
> all the initiators (clients) are the things which format and control the
> filesystem.  So the limitations of the filesystem are determined by
> whichever clustering filesystem you've chosen to implement.  It probably
> won't do snapshots and so forth.  Although the ZFS filesystem could make a
> snapshot, it wouldn't be automatically mounted or made available without the
> clients doing explicit mounts...
> 
> With NFS, the filesystem is formatted and controlled by the server.  Both
> WAFL and ZFS do some pretty good things with snapshotting, and making
> snapshots available to users without any effort.
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] L2ARC and ZIL on same SSD?

2010-07-21 Thread Garrett D'Amore

On Wed, 2010-07-21 at 09:42 -0700, Orvar Korvar wrote:
> Are there any drawbacks to partition a SSD in two parts and use L2ARC on one 
> partition, and ZIL on the other? Any thoughts?

Its probably a reasonable approach.  The ZIL can be fairly small... only
about 8 GB is probably sufficient for most typical uses.  (Even 4GB
would be a win in most cases. :-)

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-21 Thread Garrett D'Amore

On Wed, 2010-07-21 at 02:21 -0400, Richard Lowe wrote:
> I built in the normal fashion, with the CBE compilers
> (cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30), and 12u1 lint.
> 
> I'm not subscribed to zfs-discuss, but have you established whether the
> problematic build is DEBUG? (the bits I uploaded were non-DEBUG).

That would make a *huge* difference.  DEBUG bits have zero optimization,
and also have a great number of sanity tests included that are absent
from the non-DEBUG bits.  If these are expensive checks on a hot code
path, it can have a very nasty impact on performance.

Now that said, I *hope* the bits that Nexenta delivered were *not*
DEBUG.  But I've seen at least one bug that makes me think we might be
delivering DEBUG binaries.  I'll check into it.

-- Garrett

> 
> -- Rich
> 
> Haudy Kazemi wrote:
> >>> Could it somehow not be compiling 64-bit support?
> >>>
> >>>
> >>> -- 
> >>> Brent Jones
> >>> 
> >>
> >> I thought about that but it says when it boots up that it is 64-bit, and 
> >> I'm able to run
> >> 64-bit binaries.  I wonder if it's compiling for the wrong processor 
> >> optomization though?
> >> Maybe if it is missing some of the newer SSEx instructions the zpool 
> >> checksum checking is
> >> slowed down significantly?  I don't know how to check for this though and 
> >> it seems strange
> >> it would slow it down this significantly.  I'd expect even a non-SSE 
> >> enabled
> >> binary to be able to calculate a few hundred MB of checksums per second for
> >> a 2.5+ghz processor.
> >>
> >> Chad
> >
> > Would it be possible to do a closer comparison between Rich Lowe's fast 142
> > build and your slow 142 build?  For example run a diff on the source, build
> > options, and build scripts.  If the build settings are close enough, a
> > comparison of the generated binaries might be a faster way to narrow things
> > down (if the optimizations are different then a resultant binary comparison
> > probably won't be useful).
> >
> > You said previously that:
> >> The procedure I followed was basically what is outlined here:
> >> http://insanum.com/blog/2010/06/08/how-to-build-opensolaris
> >>
> >> using the SunStudio 12 compilers for ON and 12u1 for lint.
> >>   
> > Are these the same compiler versions Rich Lowe used?  Maybe there is a
> > compiler optimization bug.  Rich Lowe's build readme doesn't tell us which
> > compiler he used.
> > http://genunix.org/dist/richlowe/README.txt
> >
> >> I suppose the easiest way for me to confirm if there is a regression or if 
> >> my
> >> compiling is flawed is to just try compiling snv_142 using the same 
> >> procedure
> >> and see if it works as well as Rich Lowe's copy or if it's slow like my 
> >> other
> >> compilations.
> >>
> >> Chad
> >
> > Another older compilation guide:
> > http://hub.opensolaris.org/bin/view/Community+Group+tools/building_opensolaris
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] CPU requirements for zfs performance

2010-07-21 Thread Garrett D'Amore

On Wed, 2010-07-21 at 17:12 +0200, Saso Kiselkov wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> If you plan on using it as a storage server for multimedia data
> (movies), don't even bother considering compression, as most media files
> already come heavily compressed. Dedup might still come in handy, though.

Dedup with uncompressible or pre-compressed data is unlikely to be
useful, for much the same reason that compression isn't going to help.
You don't have repeated data.  There might be some commonality in
headers (but even that that is dubious), but the main data will be
different for each of these files.

Dedup only wins when you have significantly similar data in different
files.  Even a one byte difference in a compressed stream will ruin the
chance of dedup giving any gains.

- Garrett
> 
> - --
> Saso
> 
> On 07/21/2010 05:03 PM, Eugen Leitl wrote:
> > On Wed, Jul 21, 2010 at 04:56:26PM +0200, Roy Sigurd Karlsbakk wrote:
> > 
> >> It'll probably be ok. If you use lzjb compresion, it'll probably suffice 
> >> as well. Give it gzip-9 compression, and you might have a cpu bottleneck, 
> >> but then, for most use, that config will probably do. What sort of traffic 
> >> do you expect?
> > 
> > Thanks for the thumbs-up. Just local GBit LAN. If it does
> > ~20-40 MByte/s it should be quite enough.
> > 
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAkxHDlYACgkQRO8UcfzpOHChLgCgpwAwsJnmoiDAQ3DCY7YJQpgl
> +ysAoLelbOxnjq4ONU3Hgf+VD1cG7c0H
> =k96G
> -END PGP SIGNATURE-
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] slog/L2ARC on a hard drive and not SSD?

2010-07-21 Thread Garrett D'Amore

On Wed, 2010-07-21 at 07:56 -0700, Hernan F wrote:
> Hi,
> Out of pure curiosity, I was wondering, what would happen if one tries to use 
> a regular 7200RPM (or 10K) drive as slog or L2ARC (or both)?
> 
> I know these are designed with SSDs in mind, and I know it's possible to use 
> anything you want as cache. So would ZFS benefit from it? Would it be the 
> same? Would it slow down?
> 
> I guess it would slow things down, because it would be trying to read/write 
> from a single spindle instead of a multidisk array, right? I havent found any 
> articles discussing this, only ones talking about SSD-based slogs/caches.
> 
> Thanks,
> Hernan


I think yes, it would probably slow things down, at least for typical
usage.  However, there is a small change it might improve things by
offloading this functionality from the main spindle(s) to separate ones.
But I think you'd be better off expanding a stripe than using a disk in
this way.

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Garrett D'Amore


Your config makes me think this is an atypical ZFS configuration.  As a
result, I'm not as concerned.  But I think the multithread/concurrency
may be the biggest concern here.  Perhaps the compilers are doing
something different that causes significant cache issues.  (Perhaps the
compilers themselves are in need of an update?)

- Garrett

On Tue, 2010-07-20 at 14:10 -0700, Marcelo H Majczak wrote:
> If I can help narrow the variables, I compiled both 137 and 144 (137 is 
> minimum req. to build 144) using the same recommended compiler and lint, 
> nightly options etc. 137 works fine but 144 suffer the slowness reported. 
> System wise, I'm using only the 32bit non-debug version in an "old" 
> single-core/thread pentium-m laptop.
> 
> What I notice is that the zpool_$pool daemon had a lot more threads (total 
> 136, iirc), so something changed there but not necessarily related to the 
> problem. It also seems to be issuing a lot more writing to rpool, though I 
> can't tell what. In my case it causes a lot of read contention since my rpool 
> is a USB flash device with no cache. iostat says something like up to 10w/20r 
> per second. Up to 137 the performance has been enough, so far, for my 
> purposes on this laptop.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Garrett D'Amore

So the next question is, lets figure out what richlowe did
differently. ;-)

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-19 Thread Garrett D'Amore

On Mon, 2010-07-19 at 17:40 -0700, Chad Cantwell wrote:
> fyi, everyone, I have some more info here.  in short, rich lowe's 142 works
> correctly (fast) on my hardware, while both my compilations (snv 143, snv 144)
> and also the nexanta 3 rc2 kernel (134 with backports) are horribly slow.

The idea that its a regression introduced into NCP 3RC2 is not very far
fetched at all.

It certainly could stand some more analysis.

- Garrett

> 
> I finally got around to trying rich lowe's snv 142 compilation in place of
> my own compilation of 143 (and later 144, not mentioned below), and unlike
> my own two compilations, his works very fast again on my same zpool (
> scrubbing avg increased from low 100s to over 400 MB/s within a few
> minutes after booting into this copy of 142.  I should note that since
> my original message, I also tried booting from a Nexanta Core 3.0 RC2 ISO
> after realizing it had zpool 26 support backported into 134 and was in
> fact able to read my zpool despite upgrading the version.  Running a
> scrub from the F2 shell on the Nexanta CD was also slow scrubbing, just
> like the 143 and 144 that I compiled.  So, there seem to be two possibilities.
> Either (and this seems unlikely) there is a problem introduced post-142 which
> slows things down, and it occured in 143, 144, and was brought back to 134
> with Nexanta's backports, or else (more likely) there is something different
> or wrong with how I'm compiling the kernel that makes the hardware not
> perform up to its specifications with a zpool, and possibly the Nexanta 3
> RC2 ISO has the same problem as my own compilations.
> 
> Chad
> 
> On Tue, Jul 06, 2010 at 03:08:50PM -0700, Chad Cantwell wrote:
> > Hi all,
> > 
> > I've noticed something strange in the throughput in my zpool between
> > different snv builds, and I'm not sure if it's an inherent difference
> > in the build or a kernel parameter that is different in the builds.
> > I've setup two similiar machines and this happens with both of them.
> > Each system has 16 2TB Samsung HD203WI drives (total) directly connected
> > to two LSI 3081E-R 1068e cards with IT firmware in one raidz3 vdev.
> > 
> > In both computers, after a fresh installation of snv 134, the throughput
> > is a maximum of about 300 MB/s during scrub or something like
> > "dd if=/dev/zero bs=1024k of=bigfile".
> > 
> > If I bfu to snv 138, I then get throughput of about 700 MB/s with both
> > scrub or a single thread dd.
> > 
> > I assumed at first this was some sort of bug or regression in 134 that
> > made it slow.  However, I've now tested also from the fresh 134
> > installation, compiling the OS/Net build 143 from the mercurial
> > repository and booting into it, after which the dd throughput is still
> > only about 300 MB/s just like snv 134.  The scrub throughput in 143
> > is even slower, rarely surpassing 150 MB/s.  I wonder if the scrubbing
> > being extra slow here is related to the additional statistics displayed
> > during the scrub that didn't used to be shown.
> > 
> > Is there some kind of debug option that might be enabled in the 134 build
> > and persist if I compile snv 143 which would be off if I installed a 138
> > through bfu?  If not, it makes me think that the bfu to 138 is changing
> > the configuration somewhere to make it faster rather than fixing a bug or
> > being a debug flag on or off.  Does anyone have any idea what might be
> > happening?  One thing I haven't tried is bfu'ing to 138, and from this
> > faster working snv 138 installing the snv 143 build, which may possibly
> > create a 143 that performs faster if it's simply a configuration parameter.
> > I'm not sure offhand if installing source-compiled ON builds from a bfu'd
> > rpool is supported, although I suppose it's simple enough to try.
> > 
> > Thanks,
> > Chad Cantwell
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Performance advantages of spool with 2x raidz2 vdev"s vs. Single vdev

2010-07-19 Thread Garrett D'Amore

On Mon, 2010-07-19 at 12:06 -0500, Bob Friesenhahn wrote:
> On Mon, 19 Jul 2010, Garrett D'Amore wrote:
> >
> > With those same 14 drives, you can get 7x the performance instead of 2x
> > the performance by using mirrors instead of raidz2.
> 
> This is of course constrained by the limits of the I/O channel. 
> Sometimes the limits of PCI-E or interface cards become the dominant 
> factor.
> 
> Bob

Of course. ;-)  One *hopes* that if you have direct connects (not via
expanders) that each of the sata channnels has its own channel to the
hba, and the hba has enough pcie bandwidth to support all of its
channels being used fully.  This may or may not be the case in you
system.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Performance advantages of spool with 2x raidz2 vdev"s vs. Single vdev

2010-07-19 Thread Garrett D'Amore

On Mon, 2010-07-19 at 01:28 -0700, tomwaters wrote:
> Hi guys, I am about to reshape my data spool and am wondering what 
> performance diff. I can expect from the new config. Vs. The old.
> 
> The old config. Is a pool of a single vdev of 8 disks raidz2.
> The new pool config is 2vdev's of 7 disk raidz2 in a single pool.
> 
> I understand it should be better with higher io throughputand better 
> read/write rates...but interested to hear the science behind it.
> 
> I have googles and read the ifs best practice guide and evil tuning, but 
> neither cover my question.
> 
> Appreciate any advice.
> 
> FYI, it's just a home serverbut I like it.

Very simple.  2vdevs gives 2 active "spindles", so you get about twice
the performance of a single disk.

raidz2 generally gives the performance of a single disk.

For high performance, if you can sacrifice the storage, I recommend a
vdev made up of two-drive mirrors.  This gives pretty good resilience,
and good performance.  (Its not as "safe" as say raidz2, though, because
with raidz2 you can lose two drives per vdev, where as with mirrors, you
have many more vdevs and can only lose one drive.)

With those same 14 drives, you can get 7x the performance instead of 2x
the performance by using mirrors instead of raidz2.

-- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-18 Thread Garrett D'Amore

On Sun, 2010-07-18 at 16:18 -0700, Richard L. Hamilton wrote:

> 
> I would imagine that if it's read-mostly, it's a win, but
> otherwise it costs more than it saves.  Even more conventional
> compression tends to be more resource intensive than decompression...
> 
> What I'm wondering is when dedup is a better value than compression.
> Most obviously, when there are a lot of identical blocks across different
> files; but I'm not sure how often that happens, aside from maybe
> blocks of zeros (which may well be sparse anyway).

Shared/identical blocks come into play in several specific scenarios:

1) Multiple VMs, cloud.  If you have multiple guest OS' installed,
they're going to benefit heavily from dedup.  Even Zones can benefit
here.

2) Situations with lots of copies of large amounts of data where only
some of the data is different between each copy.  The classic example is
a Solaris build server, hosting dozens or even hundreds, of copies of
the Solaris tree, each being worked on by different developers.
Typically the developer is working on something less than 1% of the
total source code, so the other 99% can be shared via dedup.

For general purpose usage, e.g. hosting your music or movie collection,
I doubt that dedup offers any real advantage.  If I were talking about
deploying dedup, I'd only use it in situations like the two I mentioned,
and not for just a general purpose storage server.  For general purpose
applications I think compression is better.  (Though I think dedup will
have higher savings -- significantly so -- in the particular situation
where you know you lots and lots of duplicate/redundant data.)

Note also that dedup actually does some things where your duplicated
data may gain an effective increase in redundancy/security, because it
does make sure that the data that is deduped has higher redundancy than
non-deduped data.  (This sounds counterintuitive, but as long as you
have at least 3 copies of the duplicated data, its a net win.)

Btw, compression on top of dedup may actually kill your benefit of
dedup.   My hypothesis (unproven, admittedly) is that because many
compression algos actually cause small permutations of data to
significantly change the bit values (even just by changing their offset
in the binary) in the overall compressed object, it can seriously defeat
dedup's efficacy.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recommended RAM for ZFS on various platforms

2010-07-16 Thread Garrett D'Amore

On Fri, 2010-07-16 at 11:57 -0700, Michael Johnson wrote:
> us, why do you say I'd be able to get away with less RAM in FreeBSD 
> (as compared to NexentaStor, I'm assuming)?  I don't know tons about
> the OSs in 
> question; is FreeBSD just leaner in general? 

Compared to Solaris, in my estimation, yes, its a little leaner.  Not
necessarily a lot -- the bulk of memory consumption these days is ZFS
and applications (Firefox!)

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recommended RAM for ZFS on various platforms

2010-07-16 Thread Garrett D'Amore

1GB isn't enough for a real system.  2GB is a bare minimum.  If you're
going to use dedup, plan on a *lot* more.  I think 4 or 8 GB are good
for a typical desktop or home NAS setup.  With FreeBSD you may be able
to get away with less.  (Probably, in fact.)

Btw, instead of RAIDZ2, I'd recommend simply using stripe of mirrors.
You'll have better performance, and good resilience against errors.  And
you can grow later as you need to by just adding additional drive pairs.

-- Garrett

On Fri, 2010-07-16 at 10:24 -0700, Michael Johnson wrote:
> I'm currently planning on running FreeBSD with ZFS, but I wanted to 
> double-check 
> how much memory I'd need for it to be stable.  The ZFS wiki currently says 
> you 
> can go as low as 1 GB, but recommends 2 GB; however, elsewhere I've seen 
> someone 
> claim that you need at least 4 GB.  Does anyone here know how much RAM 
> FreeBSD 
> would need in this case?
> 
> Likewise, how much RAM does OpenSolaris need for stability when running ZFS? 
>  How about other OpenSolaris-based OSs, like NexentaStor?  (My searching 
> found 
> that OpenSolaris recommended at least 1 GB, while NexentaStor said 2 GB was 
> okay, 4 GB was better.  I'd be interested in hearing your input, though.)
> 
> If it matters, I'm currently planning on RAID-Z2 with 4x500GB consumer-grade 
> SATA drives.  (I know that's not a very efficient configuration, but I'd 
> really 
> like the redundancy of RAID-Z2 and I just don't need more than 1 TB of 
> available 
> storage right now, or for the next several years.)  This is on an AMD64 
> system, 
> and the OS in question will be running inside of VirtualBox, with raw access 
> to 
> the drives.
> 
> Thanks,
> Michael
> 
> 
>   
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS bug - CVE-2010-2392

2010-07-15 Thread Garrett D'Amore

On Thu, 2010-07-15 at 13:47 -0500, Dave Pooser wrote:
> Looks like the bug affects through snv_137. Patches are available from the
> usual location--  for OpenSolaris.


Got a CR number for this?  (Or a link to where I can find out about the
CVE number?)

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How do I clean up corrupted files from zpool status -v?

2010-07-15 Thread Garrett D'Amore

There's probably a way to clean up those old entries, I'm just not sure
what it is.  Is the data shared with any snapshots or clones?  I'd
expect you have to remove all references to the blocks, not just the
files but also in snapshots or cloned images.

- Garrett

On Thu, 2010-07-15 at 10:12 -0700, Kris Kasner wrote:
> Today at 09:44, Garrett D'Amore  wrote:
> 
> >
> > Those corrupt files are corrupt forever. Until they are removed.  I
> > recommend doing a scrub.  There are probably other experts here
> > (Richard?) who can suggest a permanent fix.
> >
> 
> Right, and we're OK with that.. We were lucky - all of the corrupt files are 
> non-essential. When I remove the files and replace them, I get something that 
> looks like a hex device:block number (ie: <0x86>:<0x38fcd>).
> 
> The server is a v440, so I was able to have someone add some extra drives.
> zpool replace zroot c1t1d0s2 c1t2d0s2 
> failed to complete.. it left zpool status looking like this:
> 10:40:33 catalina(36)> sudo zpool status -v
> Password:
>pool: zroot
>   state: DEGRADED
> status: One or more devices has experienced an error resulting in data
>  corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>  entire pool from backup.
> see: http://www.sun.com/msg/ZFS-8000-8A
>   scrub: resilver completed after 0h31m with 4 errors on Tue Jul 13 11:47:07 
> 2010
> config:
> 
>  NAMESTATE READ WRITE CKSUM
>  zroot   DEGRADED28 0 0
>mirrorDEGRADED60 027
>  replacing   DEGRADED31 053
>c1t1d0s2  DEGRADED   129 099  too many errors
>c1t2d0s2  ONLINE   0 084  24.5G resilvered
>  c1t0d0s2ONLINE   0 087  24.4G resilvered
> 
> errors: Permanent errors have been detected in the following files:
> 
>  //usr/dt/lib/sparcv9/libDtWidget.so.2
>  //platform/sun4us/failsafe
>  //opt/staroffice8/share/gallery/www-graf/bluleft.gif
> 
> /var/tmp/patches/10_Recommended/125541-04/SUNWthunderbird/reloc/lib/thunderbird/components/librdf.so
> 
> 
> If I delete one of these files, zpool status -v shows that device/block 
> identifier I mentioned previously.. I've run a few scrubs, but they don't 
> change anything.
> 
> 
> The system appears stable right now, our internal customers have no idea 
> anything is wrong (IE, their apps are stable). We're planning on migrating 
> them 
> to a Niagara blade to return things to "known good".
> 
> 
> I'm still curious to know if anyone knows a fix for this kind of issue, if 
> there is one. I fully expect that if I was running UFS on one drive and it 
> failed like this zfs drive failed the system would have panicked. That's a 
> big win. I would still like to get to the bottom of this issue. :-)
> 
> 
> Thanks again for your replies.
> 
> --Kris
> 
> 
> 
> 
> 
> 
> >>
> >> Today at 16:15, Garrett D'Amore  wrote:
> >>
> >>> Hey Kris (glad to see someone from my QCOM days!):
> >>>
> >>> It should automatically clear itself when you replace the disk.  Right
> >>> now you're still degraded since you don't have full redundancy.
> >>>
> >>>   - Garrett
> >>>
> >>>
> >>> On Mon, 2010-07-12 at 16:10 -0700, Kris Kasner wrote:
> >>>> Hi Folks..
> >>>>
> >>>> I have a system that was inadvertently left unmirrored for root. We were 
> >>>> able
> >>>> to add a mirror disk, resilver, and fix the corrupted files (nothing very
> >>>> interesting was corrupt, whew), but zpool status -v still shows errors..
> >>>>
> >>>> Will this self correct when we replace the degraded disk and resilver? 
> >>>> Or is
> >>>> there something else that I'm not finding that I need to do to clean up?
> >>>>
> >>>> This is Solaris 10 u8, zpool v15
> >>>> 15:52:50 catalina(34)> sudo zpool status -v
> >>>>pool: zroot
> >>>>   state: DEGRADED
> >>>> status: One or more devices has experienced an error resulting in data
> >>>>  corruption.  Applications may be affected.
> >>>> action: Restore the file in question if possible.  Otherwise restore the
> >>>>  entire pool from backup.
> >>>>

Re: [zfs-discuss] How do I clean up corrupted files from zpool status -v?

2010-07-15 Thread Garrett D'Amore

On Mon, 2010-07-12 at 16:25 -0700, Kris Kasner wrote:
> Thanks for the reply..
> 
> I got derailed by a DBA while writing the email, I should have been more 
> clear - I realize that the 'DEGRADED' states should resolve after I replace 
> the 
> disk, but what about the section that states:
> " errors: Permanent errors have been detected in the following files: "
> 
> 
> Will those resolve too? or will it still think that there are corrupt files 
> lying around. They all had valid paths at the start of the process, when I 
> unlinked them and replaced them with good copies they changed to the
> >>  zroot/packages:<0x2531d>
> >>  <0x6e>:<0xc0f2>
> format.
> 
> I'm mostly concerned because I want spool status to show up clean and error 
> free so our monitoring can catch it correctly.

Those corrupt files are corrupt forever. Until they are removed.  I
recommend doing a scrub.  There are probably other experts here
(Richard?) who can suggest a permanent fix.

- Garrett

> 
> Thanks again.
> 
> --Kris
> 
> Today at 16:15, Garrett D'Amore  wrote:
> 
> > Hey Kris (glad to see someone from my QCOM days!):
> >
> > It should automatically clear itself when you replace the disk.  Right
> > now you're still degraded since you don't have full redundancy.
> >
> > - Garrett
> >
> >
> > On Mon, 2010-07-12 at 16:10 -0700, Kris Kasner wrote:
> >> Hi Folks..
> >>
> >> I have a system that was inadvertently left unmirrored for root. We were 
> >> able
> >> to add a mirror disk, resilver, and fix the corrupted files (nothing very
> >> interesting was corrupt, whew), but zpool status -v still shows errors..
> >>
> >> Will this self correct when we replace the degraded disk and resilver? Or 
> >> is
> >> there something else that I'm not finding that I need to do to clean up?
> >>
> >> This is Solaris 10 u8, zpool v15
> >> 15:52:50 catalina(34)> sudo zpool status -v
> >>pool: zroot
> >>   state: DEGRADED
> >> status: One or more devices has experienced an error resulting in data
> >>  corruption.  Applications may be affected.
> >> action: Restore the file in question if possible.  Otherwise restore the
> >>  entire pool from backup.
> >> see: http://www.sun.com/msg/ZFS-8000-8A
> >>   scrub: resilver completed after 0h48m with 15 errors on Mon Jul 12 
> >> 15:41:50
> >> 2010
> >> config:
> >>
> >>  NAME  STATE READ WRITE CKSUM
> >>  zroot DEGRADED18 0 0
> >>mirror  DEGRADED44 023
> >>  c1t1d0s2  DEGRADED74 023  too many errors
> >>  c1t0d0s2  ONLINE   0 067  29.8G resilvered
> >>
> >> errors: Permanent errors have been detected in the following files:
> >>
> >>  zroot/packages:<0xad58>
> >>  zroot/packages:<0x11477>
> >>  zroot/packages:<0x2531d>
> >>  <0x6e>:<0xc0f2>
> >>  <0x6e>:<0xce68>
> >>  <0x6e>:<0x28d9f>
> >>  <0x6e>:<0x2b5c1>
> >>  <0x76>:<0x17369>
> >>  <0x86>:<0x11fda>
> >>  <0x86>:<0x13253>
> >>  <0x86>:<0x13346>
> >>  <0x86>:<0x33ed3>
> >>  <0x86>:<0x38fcd>
> >>  <0x86>:<0x39007>
> >> 15:53:04 catalina(35)>
> >>
> >>
> >> Thanks for any suggestions. The system is in another city, so I can't 
> >> quickly
> >> test replacing the disk and see what happens..
> >>
> >> Kris
> >>
> >
> >
> 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Legality and the future of zfs...

2010-07-14 Thread Garrett D'Amore

On Thu, 2010-07-15 at 11:48 +0900, BM wrote:

> 
> But hey, why to fork ZFS and mess with a stale Solaris code, if the
> entire future of Solaris is a closed proprietary payware anyway? And
> opposite to ZFS, we have totally free BTRFS that has been moved to the
> kernel.org and is *free* and is for Linux that is *already* popular
> AND *free*? Yes, Linux is not the best OS, if you compare to Solaris
> in some technical parts that would make things just more
> sophisticated. But on the other hand Linux is totally free, cheap and
> you can live with these inconveniences perfectly (just drink more
> water and breath more deeply). You can curse these inconveniences, but
> at the end it still works cheap and reliable and is just OK to get
> things done. Well, BTRFS sucks at some points (software RAID at kernel
> level comes to mind), but it is still better FS for Linux in many
> places than extN, but it is still free and more popular. Maybe today
> BTRFS is not the right answer as ZFS is to the market, but tomorrow it
> probably will be just as opposite, I think: geeks will use BTRFS and
> Linux and soon Oracle will deeply regret they're killed Solaris, but
> no one will throw their energy to make Solaris at least as strong as
> Linux is now.

I think you're wrong on so many points here about ZFS that I don't know
where to begin, so I won't even try.  I am curious why you're hanging
about in here though, if you're so convinced that there is no future in
ZFS.

> 
> > I believe I can safely say that Nexenta is committed to the continued 
> > development and enhancement of this code base -- and to doing so in the 
> > open.
> Yeah, and Nexenta is also committed to backport newest updates from
> 140 and younger builds just back to snv_134. So I can imagine that
> soon new OS from Nexenta will be called "Super Nexenta Version 134".

We're working on upgrading to a newer version, but its too risky to put
the latest code into production yet.  For 4.x which is in early
development right now, we will have much newer bits.

> :-)
> 
> Currently from what I see, I think Nexenta will also die eventually.
> Because of BTRFS for Linux, Linux's popularity itself and also thanks
> to the Oracle's help. Sorry telling this to you, working @nexenta.com
> though... You, guys, are doing a very good job, but in fact, your days
> are also doomed, I think.

I think you're wrong.  Very much so.  And unlike you, I've put my money
where my mouth is.

That said, as you appear to be so firmly convinced that there is no
possible positive way forward for ZFS or Solaris, I recommend you go
elsewhere instead of apparently wasting your time here.  You seem to be
totally convinced in the future of Linux and BTRFS, so I recommend you
leave this community and join that one.

Meanwhile I'm committed to positive solutions.  And I'll have more to
say about specific answers to some of the concerns raised here soon.
But I'm focused on solving real problems right now, and don't want to
get caught up in a mail storm debate before the critical foundations are
laid, so for now you'll just have to be patient.

In short, I'm not interested in hearing any more of the whining about
how terrible things are.  However, if you want to work on a positive
solution, contact me out of band and I'll talk with you more.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Legality and the future of zfs...

2010-07-14 Thread Garrett D'Amore

On Wed, 2010-07-14 at 13:59 -0700, Paul B. Henson wrote:
> On Wed, 14 Jul 2010, Roy Sigurd Karlsbakk wrote:
> 
> > Once the code is in the open, it'll remain there. To quote Cory Doctorow
> > on this, it's easy release the source of a project, it's like adding ink
> > to your swimming pool, but it's a little harder to remove the ink from
> > the pool...
> 
> Woo-hoo, the code already released won't be taken back ;). But considering
> virtually all zfs development has been and presumably will continue to be
> by Sun/Oracle employees, that code is going to get stale pretty quick if
> they stop contributing to it...
> 
> 

I wish folks would realize something important:

Continued release of the source code for ON is *not* dependent on any
"OpenSolaris" community or on any binary distribution.  I *strongly*
doubt that Oracle is going to stop making source code available it
costs them almost nothing (compared to the rest of the "community"
efforts, which did cost Sun significantly with little or no gain).
Furthermore, the open source of Solaris is critical to many Solaris
customers, and I think exec mgmt realizes that if they tried to close it
back up, they'd lose far more than just Solaris customers, but probably
also Oracle DB customers.

The *code* is probably not going away (even updates to the kernel).
Even if the community dies, is killed, or commits OGB induced suicide.

There is another piece I'll add: even if Oracle were to stop releasing
ZFS or OpenSolaris source code, there are enough of us with a vested
interest (commercial!) in its future that we would continue to develop
it outside of Oracle.  It won't just go stagnant and die.  I believe I
can safely say that Nexenta is committed to the continued development
and enhancement of this code base -- and to doing so in the open.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Encryption?

2010-07-14 Thread Garrett D'Amore

On Wed, 2010-07-14 at 01:06 -0700, Peter Taps wrote:
> > Btw, if you want a commercially supported and maintained product, have
> > you looked at NexentaStor? Regardless of what happens with OpenSolaris,
> > we aren't going anywhere. (Full disclosure: I'm a Nexenta Systems
> > employee. :-)
> > 
> > -- Garrett
> 
> Hi Garrett,
> 
> I would like to know why you think Nexenta would continue to stay if 
> OpenSolaris goes away.
> 
> I feel the fate of Nexenta is no different than the fate of my startup 
> company. Both of us are heavily dependent on zfs. And we know OpenSolaris 
> version of zfs is the most stable version. 
> 
> Any business that is dependent on zfs must plan for two things as a 
> contingency:
> 
> 1. Look for an alternative for zfs
> 2. Look for an alternative for OpenSolaris
> 
> Preferably both need to be open source with no licenses attached.
> 
> Ideally, zfs lawsuit will be put to rest and Oracle will commit for 
> continuing to support OpenSolaris.
> 
> Regards,
> Peter

Nexenta is investing in the technology within OpenSolaris heavily -- and
we are working on an effort which will decouple our product from a hard
dependency on bits from Oracle's Solaris.  I can't talk too much about
this right now, but I will probably have a lot more to say about this
early next month.   Upshot of this is is that if OpenSolaris goes away,
we will continue to be able to work with the source code we have (plus
in house source code we are creating to replace certain bits we don't
have source to today!), so that our future is not so dependent on
Oracle's continued good will.

We're also hiring in engineering and growing a world class kernel team,
so we can continue to sustain and improve the code even if Oracle were
to pull the plug on OpenSolaris sources.  We already have innovations in
our code base which Oracle's tree lacks -- even though we have made our
sources for the OS available for Oracle.

That said, I feel extremely confident that while Oracle *may* pull the
plug on this "OpenSolaris" thing, the *source* code which makes up the
critical platform bits (ON) for both Solaris and OpenSolaris will
probably continue to be released as source code going forward.  There
are simply too many positive benefits to its customers, and frankly too
many customers would leave Oracle (not just Solaris, but go from Oracle
to DB2) if Oracle were to pull a stunt such as ceasing the delivery of
source code.  (And further, even though the "request-sponsor" process
has been a dismal failure, I suspect that there will always be a way for
contributors to send improvements back to Oracle, even if the formal
OpenSolaris process for it were to go away.)

Now, the question of ZFS is IMO a lot more concerning.  If we were
suddenly unable to continue to work with ZFS due to patent
considerations, there is no question that this would have a devastating
impact on our business.  It would have a devastating impact on Solaris
too.  We have mitigation plans in place for that eventuality which I
cannot discuss, but it would be extremely painful to us, as well as
everyone else in this ecosystem.

The good news is that so far it seems likely that NetApp's suit will
fail.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Legality and the future of zfs...

2010-07-13 Thread Garrett D'Amore

On Tue, 2010-07-13 at 10:51 -0400, Edward Ned Harvey wrote:
> > From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us]
> >
> > > A private license, with support and indemnification from Sun, would
> > > shield Apple from any lawsuit from Netapp.
> > 
> > The patent holder is not compelled
> > in any way to offer a license for use of the patent.  Without a patent
> > license, shipping products can be stopped dead in their tracks.
> 
> It may be true, that Netapp could stop apple from shipping OSX, if Apple had
> ZFS in OSX, and Netapp won the lawsuit.  But there was a time when it was
> absolutely possible for Sun & Apple to reach an agreement which would limit
> Apple's liability in the event of lawsuit waged against them.
> 
> CDDL contains an explicit disclaimer of warranty, which means, if Apple were
> to download CDDL ZFS source code and compile and distribute it themselves,
> they would be fully liable for any lawsuit waged against them.  But CDDL
> also allows for Sun to distribute ZFS binaries under a different license, in
> which Sun could have assumed responsibility for losses, in the event Apple
> were to be sued.

That would not, IMO, have prevented any potential stop-ship order from
keeping MacOS X shipping.  I just think it would have created a
situation where Apple could have insisted that Oracle (well Sun)
reimburse it for lost revenue.

The lawyers at Sun were typically defensive in that they frowned (very
much) upon any legal agreements which left Sun in a position if
unlimited legal liability.  This actually nearly prevented the
development of certain software, since that software required an NDA
clause which provided for unlimited liability due to lost revenue were
Sun to leak the NDA content.  (We developed the software using openly
obtainable materials rather than NDA content, to prevent this
possibility.)

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How do I clean up corrupted files from zpool status -v?

2010-07-12 Thread Garrett D'Amore

Hey Kris (glad to see someone from my QCOM days!):

It should automatically clear itself when you replace the disk.  Right
now you're still degraded since you don't have full redundancy.

- Garrett


On Mon, 2010-07-12 at 16:10 -0700, Kris Kasner wrote:
> Hi Folks..
> 
> I have a system that was inadvertently left unmirrored for root. We were able 
> to add a mirror disk, resilver, and fix the corrupted files (nothing very 
> interesting was corrupt, whew), but zpool status -v still shows errors..
> 
> Will this self correct when we replace the degraded disk and resilver? Or is 
> there something else that I'm not finding that I need to do to clean up?
> 
> This is Solaris 10 u8, zpool v15
> 15:52:50 catalina(34)> sudo zpool status -v
>pool: zroot
>   state: DEGRADED
> status: One or more devices has experienced an error resulting in data
>  corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>  entire pool from backup.
> see: http://www.sun.com/msg/ZFS-8000-8A
>   scrub: resilver completed after 0h48m with 15 errors on Mon Jul 12 15:41:50 
> 2010
> config:
> 
>  NAME  STATE READ WRITE CKSUM
>  zroot DEGRADED18 0 0
>mirror  DEGRADED44 023
>  c1t1d0s2  DEGRADED74 023  too many errors
>  c1t0d0s2  ONLINE   0 067  29.8G resilvered
> 
> errors: Permanent errors have been detected in the following files:
> 
>  zroot/packages:<0xad58>
>  zroot/packages:<0x11477>
>  zroot/packages:<0x2531d>
>  <0x6e>:<0xc0f2>
>  <0x6e>:<0xce68>
>  <0x6e>:<0x28d9f>
>  <0x6e>:<0x2b5c1>
>  <0x76>:<0x17369>
>  <0x86>:<0x11fda>
>  <0x86>:<0x13253>
>  <0x86>:<0x13346>
>  <0x86>:<0x33ed3>
>  <0x86>:<0x38fcd>
>  <0x86>:<0x39007>
> 15:53:04 catalina(35)>
> 
> 
> Thanks for any suggestions. The system is in another city, so I can't quickly 
> test replacing the disk and see what happens..
> 
> Kris
> 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Encryption?

2010-07-12 Thread Garrett D'Amore

On Mon, 2010-07-12 at 12:55 -0700, Brandon High wrote:
> On Mon, Jul 12, 2010 at 10:00 AM, Garrett D'Amore
>  wrote:
> Btw, if you want a commercially supported and maintained
> product, have
> you looked at NexentaStor?  Regardless of what happens with
> OpenSolaris,
> we aren't going anywhere. (Full disclosure: I'm a Nexenta
> Systems
> employee. :-)
> 
> 
> I'm trying to decide for myself when I'll give up on Oracle releasing
> another dev or release build and moving to something like Nexenta
> Core.
> 
> 
> I actually *like* the Solaris user space, so GNU/Debian userspace
> isn't that compelling for me. 

The distinction is quickly shrinking.  I think the trend has been to
value compatibility with Linux over compatibility with legacy Solaris,
at least for OpenSolaris and probably also for whatever next release of
Solaris might be forthcoming.  (At least at the shell/command line
level.  The *library* level -i.e. C API - is a totally different story,
of course.)

> I see it enough at work that using something different at home is
> novel and helps keep me honest. I also don't see a roadmap for the
> upcoming releases or what release of Debian or Ubuntu they'll be based
> on.

We have plans centered around 3.0.x and 3.1, and our plans for 4.0 are
still forming.  For 3.0. and 3.1, we will remain based on the same
release of Ubuntu.  For 4.0, there will be a major change, but the
ultimate base of this is still under debate.

I don't know if marketing has released any timelines yet, so I won't do
so here.  But you should contact our sales group if you want to find out
more -- they can probably say more than I can.

- Garrett

> 
> 
> -B
> 
> -- 
> Brandon High : bh...@freaks.com
> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Encryption?

2010-07-12 Thread Garrett D'Amore

On Mon, 2010-07-12 at 09:41 -0700, Michael Johnson wrote:
> Nikola M wrote:
> >Freddie Cash wrote:
> >> You definitely want to do the ZFS bits from within FreeBSD.
> >Why not using ZFS in OpenSolaris? At least it has most stable/tested
> >implementation and also the newest one if needed?
> 
> 
> I'd love to use OpenSolaris for exactly those reasons, but I'm wary of using 
> an 
> operating system that may not continue to be updated/maintained.  If 
> OpenSolaris 
> had continued to be regularly released after Oracle bought Sun I'd be 
> choosing 
> it.  As it is, I don't want to be pessimistic, but the doubt about 
> OpenSolaris's 
> future is enough to make me choose FreeBSD instead.  (I'm sure that such 
> sentiments won't make me popular here, but so far Oracle has been 
> frustratingly 
> silent on their plans for OpenSolaris.)  At the very least, if FreeBSD 
> doesn't 
> do what I want I can switch the system disk to OpenSolaris and keep using the 
> same pool.  (Right?)
> 
> Going back to my original question: does anyone know of any problems that 
> could 
> be caused by using raidz on top of encrypted drives?  If there were a 
> physical 
> read error, which would get amplified by the encryption layer (if I'm 
> understanding full-disk encryption correctly, which I may not be), would ZFS 
> still be able to recover?
> 

I don't know about ramifications (though I suspect that a broadening
error scope would decrease ZFS' ability to isolate and work around
problematic regions on the media), but one thing I do know.  If you use
FreeBSD disk encryption below ZFS, then you won't be able able to import
your pools to another implementation -- you will be stuck with FreeBSD.

Btw, if you want a commercially supported and maintained product, have
you looked at NexentaStor?  Regardless of what happens with OpenSolaris,
we aren't going anywhere. (Full disclosure: I'm a Nexenta Systems
employee. :-)

-- Garrett
> 
>   
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Legality and the future of zfs...

2010-07-12 Thread Garrett D'Amore

On Mon, 2010-07-12 at 17:05 +0100, Andrew Gabriel wrote:
> Linder, Doug wrote:
> > Out of sheer curiosity - and I'm not disagreeing with you, just wondering - 
> > how does ZFS make money for Oracle when they don't charge for it?  Do you 
> > think it's such an important feature that it's a big factor in customers 
> > picking Solaris over other platforms?
> >   
> 
> Yes, it is one of many significant factors in customers choosing Solaris 
> over other OS's.
> Having chosen Solaris, customers then tend to buy Sun/Oracle systems to 
> run it on.
> 
> Of course, there are the 7000 series products too, which are heavily 
> based on the capabilities of ZFS, amongst other Solaris features.
> 

And, the next release of Solaris (whenever it comes out) is supposed to
make far more use of zfs for things like its packaging system (upgrades
using snapshots, etc.) and zones.  Indeed, its possible (I've not
checked in a long time) that S10 makes of snapshots for live upgrade if
root is zfs.

ZFS is a key strategic component of Solaris going forward.  Having to
abandon it would be a heavy blow -- quite possibly (IMO) fatal -- at
least to its future with Oracle.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

1 2 >

1 - 100 of 155 matches

Mail list logo