Re: [zfs-discuss] Tuning the ARC towards LRU

2010-04-06 Thread m...@bruningsystems.com

Hi,

In simple terms, the ARC is divided into a MRU and MFU side.
   target size (c) = target MRU size (p) + target MFU size (c-p)

On Solaris, to get from the MRU to the MFU side, the block must be
read at least once in 62.5 milliseconds.  For pure read-once workloads,
the data won't to the MFU side and the ARC will behave exactly like an
(adaptable) MRU cache.


Richard,
I am looking at the code that moves a buffer from MRU to MFU,
and as I read it, if the block is read and the time is greater than
62 milliseconds, it moves from MRU to MFU (lines ~2256 to ~2265
in arc.c).  Also, I have a program that reads the same block once every
5 seconds, and on a relatively idle machine, I can find the block in the
MFU, not the MRU (using mdb).  If the block is read again in less than 
62 milliseconds,
it stays in the MRU. 


max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup and memory/l2arc requirements

2010-04-06 Thread Darren J Moffat

On 03/04/2010 00:57, Richard Elling wrote:


This is annoying. By default, zdb is compiled as a 32-bit executable and
it can be a hog. Compiling it yourself is too painful for most folks :-(


/usr/sbin/zdb is actually a link to /usr/lib/isaexec

$ ls -il /usr/sbin/zdb /usr/lib/isaexec
300679 -r-xr-xr-x  92 root bin 8248 Nov 16 10:26 
/usr/lib/isaexec*
300679 -r-xr-xr-x  92 root bin 8248 Nov 16 10:26 
/usr/sbin/zdb*



$ ls -il /usr/sbin/i86/zdb /usr/sbin/amd64/zdb
200932 -r-xr-xr-x   1 root bin   173224 Mar 15 10:20 
/usr/sbin/amd64/zdb*
200933 -r-xr-xr-x   1 root bin   159960 Mar 15 10:20 
/usr/sbin/i86/zdb*


This means both 32 and 64 bit versions are already available and if the 
kernel is 64 bit then the 64 bit version of zdb will be run if you run 
/usr/sbin/zdb.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?

2010-04-06 Thread Markus Kovero
 Install nexenta on a dell poweredge ? 
 or one of these http://www.pogolinux.com/products/storage_director
FYI; More recent poweredges (R410,R710, possibly blades too, those with 
integrated Broadcom chips) are not working very well with opensolaris due 
broadcom network issues, hang-ups packet loss etc.  
And as opensolaris is not supported OS Dell is not interested to fix these 
issues.

Yours
Markus Kovero
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Diagnosing Permanent Errors

2010-04-06 Thread Tim Cook
On Tue, Apr 6, 2010 at 12:47 AM, Daniel Carosone d...@geek.com.au wrote:

 On Tue, Apr 06, 2010 at 12:29:35AM -0500, Tim Cook wrote:
  On Tue, Apr 6, 2010 at 12:24 AM, Daniel Carosone d...@geek.com.au
 wrote:
 
   On Mon, Apr 05, 2010 at 09:35:21PM -0700, Willard Korfhage wrote:
By the way, I see that now one of the disks is listed as degraded -
 too
   many errors. Is there a good way to identify exactly which of the disks
 it
   is?
  
   It's hidden in iostat -E, of all places.
  
   --
   Dan.
  
  
  I think he wants to know how to identify which physical drive maps to the
  dev ID in solaris.  The only way I can think of is to run something like
 DD
  against the drive to light up the activity LED.

 or look at the serial numbers printed in iostat -E

 --
 Dan.



And then what?  Cross your fingers and hope you pull the right drive on the
first go?  I don't know of any drives that come from the factory in a
hot-swap bay with the serial number printed on the front of the caddy.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Diagnosing Permanent Errors

2010-04-06 Thread Willard Korfhage
Yes, I was hoping to find the serial numbers. Unfortunately, it doesn't show 
any serial numbers for the disk attached to the Areca raid card.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?

2010-04-06 Thread Eric D. Mudama

On Tue, Apr  6 at 13:03, Markus Kovero wrote:

Install nexenta on a dell poweredge ? 
or one of these http://www.pogolinux.com/products/storage_director

FYI; More recent poweredges (R410,R710, possibly blades too, those with 
integrated Broadcom chips) are not working very well with opensolaris due 
broadcom network issues, hang-ups packet loss etc.
And as opensolaris is not supported OS Dell is not interested to fix these 
issues.


Our Dell T610 is and has been working just fine for the last year and
a half, without a single network problem.  Do you know if they're
using the same integrated part?

--eric


--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?

2010-04-06 Thread Markus Kovero
 Our Dell T610 is and has been working just fine for the last year and
 a half, without a single network problem.  Do you know if they're
 using the same integrated part?

 --eric

Hi, as I should have mentioned, integrated nics that cause issues are using 
Broadcom BCM5709 chipset and these connectivity issues have been
quite widespread amongst linux people too, Redhat tries to fix this; 
http://kbase.redhat.com/faq/docs/DOC-26837 but I believe it's  messed up in 
firmware somehow, as in our tests show 4.6.8-series firmware seems to be more 
stable.
And what comes to workarounds, disabling msi is bad if it creates latency for 
network/disk controllers and disabling c-states from Nehalem processors is just 
stupid (having no turbo, power saving etc).

Definitely no go for storage imo.

Yours
Markus Kovero

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?

2010-04-06 Thread Bruno Sousa
Hi,

I also ran into the problem of Dell+Broadcom. I fixed it by downgrading
the firmware to version 4.xxx instead of running in version 5.xxx .
You may try that one as well.

Bruno

On 6-4-2010 16:54, Eric D. Mudama wrote:
 On Tue, Apr  6 at 13:03, Markus Kovero wrote:
 Install nexenta on a dell poweredge ? 
 or one of these http://www.pogolinux.com/products/storage_director
 FYI; More recent poweredges (R410,R710, possibly blades too, those
 with integrated Broadcom chips) are not working very well with
 opensolaris due broadcom network issues, hang-ups packet loss etc.
 And as opensolaris is not supported OS Dell is not interested to
 fix these issues.

 Our Dell T610 is and has been working just fine for the last year and
 a half, without a single network problem.  Do you know if they're
 using the same integrated part?

 --eric






smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SSD sale on newegg

2010-04-06 Thread Anil
Seems a nice sale on Newegg for SSD devices. Talk about choices. What's the 
latest recommendations for a log device?

http://bit.ly/aL1dne
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?

2010-04-06 Thread R.G. Keen
Hmmm.. Tried to post this before, but it doesn't appear. I'll try again.

I've been discussing the concept of a reference design for Opensolaris systems 
with a few people. This comes very close to a system you can just buy.

I spent about six months burning up google and pestering people here about this 
issue. In the end, I largely copied a system which someone (Constantin 
Gonzalez) had blogged about here: 
=
http://www.google.com/url?sa=tsource=webct=rescd=1ved=0CAYQFjAAurl=http%3A%2F%2Fblogs.sun.com%2Fconstantin%2Fentry%2Fa_small_and_energy_efficientei=bU-7S97KKY2gnQf25bytCAusg=AFQjCNFhP99ZqaNZrhCOFgsLXHcumcVDOw
=
It was inexpensive for what I got, and worked largely the first time I 
connected it up. It would make a good reference design, excepting only that in 
the several weeks since I made it, the motherboard has been discontinued by 
ASUS, although it's still available in many places. 

A reference design is a setup that some knowing person or group has put 
together and verified to work. It is later replicable by people of lesser 
skills with little or no exposure to malfunction or long debugging. 

Here's the system I did:
ASUS M3A78-CM (about $60 when I got mine)
AMD Athlon II 240e ($70, the 240 is cheaper, but a few more watts)
Kingston 800MHz DDR2 unbuffered ECC ram, 2x 2GB ($80)
Syba PCIe x1 dual port SATA card ($26)
2x 40GB 2.5 SATA drives for mirrored boot pool ($52)
6x 750GB SATA drives for raidz2 storage pool, giving 4TB usable and 2-disk 
failure immunity.
Case, power supply, cables, etc. to taste. I bought new, because I was looking 
for a long-term reliable backup server, but used would work as well for lower 
cost. 

In spite of reported issues with the ethernet chipset on the mobo, it just 
worked on my network, as installed. In fact, all of it just worked on install. 
The driver test utility reported zero issues. USB worked. Keyboard, mouse, and 
integrated video worked. So did the Syba card. No driver finagling.  Bring up 
time was only extended by my not knowing which commands to type. That includes 
making the remote console, remote desktop, and storage array available through 
the network on my Windows XP email machine.

Now that I know what commands to type, it would take me less than an hour to 
set another one up from unpacking the shipping boxes. The knowing what 
commands to type took me a bit, but it's not terribly taxing. Most of it was 
finding the help sections on the web and in the Open Solaris Bible and typing 
what I was told. 

This would be a great candidate for a reference design except for Asus 
discontinuing it. That will be the bane of reference designs like this. It 
pretty much requires an ongoing effort of people assembling and documenting 
their work as new motherboards flow through the system. 

This is kind of what the HCL was probably intended to be, but does not measure 
up to for a neophytes. The HCL for Solaris proper is much more usable in that 
it seems to have a database back end and lets you select things, bringing up 
trees of choices. Ah, well. 

I think a local custom computer shop could replicate my server very quickly 
indeed. 

It's not a just buy and unwrap but it's remarkably close.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS on-disk DDT block arrangement

2010-04-06 Thread taemun
I was wondering if someone could explain why the DDT is seemingly
(from empirical observation) kept in a huge number of individual blocks,
randomly written across the pool, rather than just a large binary chunk
somewhere.

Having been victim of the relly long times it takes to destroy a dataset
that has dedup=on, I was wondering why that was. From memory, when the
destroy process was running, something like iopattern -r showed constant 99%
random reads. This seems like a very wasteful approach to allocating blocks
for the DDT.

Having deleted the 900GB dataset, finally, I now only have around 152GB
(allocated PSIZE) left deduped on that pool.
# zdb -DD tank
DDT-sha256-zap-duplicate: 310684 entries, size 578 on disk, 380 in core
DDT-sha256-zap-unique: 1155817 entries, size 2438 on disk, 1783 in core

So 1466501 DDT blocks. For 152GB of data, that's around 108KB/block on
average, which seems sane.

To destroy the dataset holding the files which reference the DDT, I'm
looking at 1.46 million random reads to complete the operation (less those
elements in ARC or L2ARC). That's a lot of read operations for my poor
spindles.

I've seen some people saying that the DDT blocks are around 270 bytes each,
but does it really matter, if the smallest block that zfs can read/write
(for obvious reasons) is 512 bytes? Clearly 2x 270B  512B, but couldn't
there be some way of grouping DDT elements together (in say, 1MB blocks)?

Thoughts?

(side note: can someone explain the size xxx on disk, xxx in core
statements in that zdb output for me? The numbers never seem related to the
number of entries or  anything.)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Rollback From ZFS Send

2010-04-06 Thread Tony MacDoodle
Can I rollback a snapshot that I did a zfs send on?

ie: zfs send testpool/w...@april6  /backups/w...@april6_2010

Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Rollback From ZFS Send

2010-04-06 Thread Nicolas Williams
On Tue, Apr 06, 2010 at 11:53:23AM -0400, Tony MacDoodle wrote:
 Can I rollback a snapshot that I did a zfs send on?
 
 ie: zfs send testpool/w...@april6  /backups/w...@april6_2010

That you did a zfs send does not prevent you from rolling back to a
previous snapshot.  Similarly for zfs recv -- that you went from one
snapshot to another by zfs receiving a send does not stop you from
rolling back to an earlier snapshot.

You do need to have an earlier snapshot to rollback to, if you want to
rollback.

Also, if you are using zfs send for backups, or for replication, and you
rollback the primary dataset, then you'll need to update your backups/
replicas accordingly.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS: Raid and dedup

2010-04-06 Thread Learner Study
Hi Folks:

I'm wondering what is the correct flow when both raid5 and de-dup are
enabled on a volume

I think we should do de-dup first and then raid5 ... is that
understanding correct?

Thanks!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: Raid and dedup

2010-04-06 Thread jeff.bonw...@oracle.com

Correct.

Jeff

Sent from my iPhone

On Apr 5, 2010, at 6:32 PM, Learner Study learner.st...@gmail.com  
wrote:



Hi Folks:

I'm wondering what is the correct flow when both raid5 and de-dup are
enabled on a storage volume

I think we should do de-dup first and then raid5 ... is that
understanding correct?

Thanks!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Diagnosing Permanent Errors

2010-04-06 Thread James C.McPherson

On  6/04/10 11:47 PM, Willard Korfhage wrote:

Yes, I was hoping to find the serial numbers. Unfortunately, it doesn't
show any serial numbers for the disk attached to the Areca raid card.



You'll need to reboot and go into the card bios to
get that information.


James C. McPherson
--
Senior Software Engineer, Solaris
Oracle
http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] refreservation and ZFS Volume

2010-04-06 Thread Tony MacDoodle
I am trying to understand how refreservation works with snapshots.

If I have a 100G zfs pool

I have 4 20G volume groups in that pool.

refreservation = 20G on all volume groups.

Now when I want to do a snapshot will this snapshot need 20G + the amount
changed (REFER)? If not I get a out of space.

How does refreservation relate to snapshots?

Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Diagnosing Permanent Errors

2010-04-06 Thread Carson Gaspar

Willard Korfhage wrote:

Yes, I was hoping to find the serial numbers. Unfortunately, it
doesn't show any serial numbers for the disk attached to the Areca
raid card.


Does Areca provide any Solaris tools that will show you the drive info?

If you are using the Areca in JBOD mode, smartctl will frequently show 
serial numbers that iostat -E will not (iostat appears to be really 
stupid about getting serial numbers compared to just about any other 
tool out there).


--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD sale on newegg

2010-04-06 Thread Erik Trimble
On Tue, 2010-04-06 at 08:26 -0700, Anil wrote: 
 Seems a nice sale on Newegg for SSD devices. Talk about choices. What's the 
 latest recommendations for a log device?
 
 http://bit.ly/aL1dne

The Vertex LE models should do well as ZIL  (though not as well as an
X25-E or a Zeus) for all non-enterprise users.

The X25-M is still the best choice for a L2ARC device, but the Vertex
Turbo or Cosair Nova are good if you're on a budget.

If you really want an SSD a boot drive, or just need something for
L2ARC, the various Intel X25-V models are cheap, if not a really great
performers. I'd recommend one of these if you want an SSD for rpool, or
if you need a large L2ARC for dedup (or similar) and can't afford
anything in the X25-M price range.  You should also be OK with a Corsair
Reactor in this performance category.



-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] refreservation and ZFS Volume

2010-04-06 Thread Daniel Carosone
On Tue, Apr 06, 2010 at 01:44:20PM -0400, Tony MacDoodle wrote:
 I am trying to understand how refreservation works with snapshots.
 
 If I have a 100G zfs pool
 
 I have 4 20G volume groups in that pool.
 
 refreservation = 20G on all volume groups.
 
 Now when I want to do a snapshot will this snapshot need 20G + the amount
 changed (REFER)? If not I get a out of space.
 
 How does refreservation relate to snapshots?

The refreservation is a committment that X amount of space can be
written to. When space currently in the volume (usedbydataset) is
shared with snapshots, new writes to those blocks will need to
allocate new space, and the original copy remains in the snapshot.
Therefore, as a snapshot is taken, the usedbyrefreservation figure
is increased from whatever its current value, back up to the size of
the refreservation. This represents the committment in advance of
space from the pool to hold the potential overwrite of the dataset, as
well as the new snapshot.  If there's not enough pool space for this
increase, the snapshot is denied. 

You have reminded me.. I wrote some patches to the zfs manpage to help
clarify this issue, while travelling, and never got around to posting
them when I got back.  I'll dig them up off my netbook later today. 

--
Dan.


pgpivPxirJvu8.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on-disk DDT block arrangement

2010-04-06 Thread Daniel Carosone
On Wed, Apr 07, 2010 at 01:52:23AM +1000, taemun wrote:
 I was wondering if someone could explain why the DDT is seemingly
 (from empirical observation) kept in a huge number of individual blocks,
 randomly written across the pool, rather than just a large binary chunk
 somewhere.

It's not really a question of physical allocation contiguity, or
pre-allocating in larger chunks.  Remember that this would not be
maintained after updates in a CoW system anyway. 

It's a question of access pattern.  The DDT is indexed by block
hash. Hashes are effectively random (for the purposes of this
discussion), and so updates to the DDT for blocks in any other
order than block hash, is effectively random-order. 

There's not really an effective way to (say) remove blocks in
block-hash order.  There might be room for some optimisations here and
there (maybe freeing the blocks of each object in hash-order) but the
overall access pattern is still going to be heavily random-order.

 (side note: can someone explain the size xxx on disk, xxx in core
 statements in that zdb output for me? The numbers never seem related to the
 number of entries or  anything.)

I've not yet seen a good one, though there has been some speculation,
from me included.

--
Dan.



pgpvWo2L8ds0e.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-06 Thread Jeroen Roodhart
Hi Roch,

 Can  you try 4 concurrent tar to four different ZFS
 filesystems (same pool). 

Hmmm, you're on to something here:

http://www.science.uva.nl/~jeroen/zil_compared_e1000_iostat_iops_svc_t_10sec_interval.pdf

In short: when using two exported file systems total time goes down to around 
4mins (IOPS maxes out at around 5500 when adding all four vmods together). When 
using four file systems total time goes down to around 3min30s (IOPS maxing out 
at about 9500).

I figured it is either NFS or a per file system data structure in the ZFS/ZIL 
interface. To rule out NFS I tried exporting two directories using default 
NFS shares (via /etc/dfs/dfstab entries). To my surprise this seems to bypass 
the ZIL all together (dropping to 100 IOPS, which results from our RAIDZ2 
configuration). So clearly ZFS sharenfs is more than a nice front end for NFS 
configuration :).  

But back to your suggestion: You clearly had a hypothesis behind your question. 
Care to elaborate?

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD sale on newegg

2010-04-06 Thread Kyle McDonald
On 4/6/2010 3:41 PM, Erik Trimble wrote:
 On Tue, 2010-04-06 at 08:26 -0700, Anil wrote: 
   
 Seems a nice sale on Newegg for SSD devices. Talk about choices. What's the 
 latest recommendations for a log device?

 http://bit.ly/aL1dne
 
 The Vertex LE models should do well as ZIL  (though not as well as an
 X25-E or a Zeus) for all non-enterprise users.

 The X25-M is still the best choice for a L2ARC device, but the Vertex
 Turbo or Cosair Nova are good if you're on a budget.

 If you really want an SSD a boot drive, or just need something for
 L2ARC, the various Intel X25-V models are cheap, if not a really great
 performers. I'd recommend one of these if you want an SSD for rpool, or
 if you need a large L2ARC for dedup (or similar) and can't afford
 anything in the X25-M price range.  You should also be OK with a Corsair
 Reactor in this performance category.

   
What about if you want to get one that you can use for both the rpool,
and ZIL (for another data pool?)
What if you want one for all 3 (rpool, ZIL, L2ARC)??

 -Kyle


   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] refreservation and ZFS Volume

2010-04-06 Thread Daniel Carosone
On Wed, Apr 07, 2010 at 06:27:09AM +1000, Daniel Carosone wrote:
 You have reminded me.. I wrote some patches to the zfs manpage to help
 clarify this issue, while travelling, and never got around to posting
 them when I got back.  I'll dig them up off my netbook later today. 

http://defect.opensolaris.org/bz/show_bug.cgi?id=15514

--
Dan. 


pgpcc4C0sSWdH.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD sale on newegg

2010-04-06 Thread Erik Trimble
On Tue, 2010-04-06 at 19:43 -0400, Kyle McDonald wrote:
 On 4/6/2010 3:41 PM, Erik Trimble wrote:
  On Tue, 2010-04-06 at 08:26 -0700, Anil wrote: 

  Seems a nice sale on Newegg for SSD devices. Talk about choices. What's 
  the latest recommendations for a log device?
 
  http://bit.ly/aL1dne
  
  The Vertex LE models should do well as ZIL  (though not as well as an
  X25-E or a Zeus) for all non-enterprise users.
 
  The X25-M is still the best choice for a L2ARC device, but the Vertex
  Turbo or Cosair Nova are good if you're on a budget.
 
  If you really want an SSD a boot drive, or just need something for
  L2ARC, the various Intel X25-V models are cheap, if not a really great
  performers. I'd recommend one of these if you want an SSD for rpool, or
  if you need a large L2ARC for dedup (or similar) and can't afford
  anything in the X25-M price range.  You should also be OK with a Corsair
  Reactor in this performance category.
 

 What about if you want to get one that you can use for both the rpool,
 and ZIL (for another data pool?)
 What if you want one for all 3 (rpool, ZIL, L2ARC)??
 
  -Kyle
 

It all boils down to performance and the tradeoffs you are willing to
make.  For good ZIL, you want something that has a very high IOPS rating
(50,000+ if possible, 10,000+ minimum, particularly when writing small
chunks). For L2ARC, you are more concerned with total size/capacity, and
modest IOPS (3000-1 IOPS, or the ability to write at least 100Mb/s
at 4-8k write sizes, plus as high as possible read I/O). For rpool use,
you don't really care about performance so much, as it's almost
exclusively read-only (one should generally not configure a swap device
on an SSD-based rpool).

You could probably live with an X25-M as something to use for all three,
but of course you're making tradeoffs all over the place.



-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD sale on newegg

2010-04-06 Thread Richard Elling
On Apr 6, 2010, at 5:00 PM, Erik Trimble wrote:
 On Tue, 2010-04-06 at 19:43 -0400, Kyle McDonald wrote:
 On 4/6/2010 3:41 PM, Erik Trimble wrote:
 On Tue, 2010-04-06 at 08:26 -0700, Anil wrote: 
 
 Seems a nice sale on Newegg for SSD devices. Talk about choices. What's 
 the latest recommendations for a log device?
 
 http://bit.ly/aL1dne
 
 The Vertex LE models should do well as ZIL  (though not as well as an
 X25-E or a Zeus) for all non-enterprise users.
 
 The X25-M is still the best choice for a L2ARC device, but the Vertex
 Turbo or Cosair Nova are good if you're on a budget.
 
 If you really want an SSD a boot drive, or just need something for
 L2ARC, the various Intel X25-V models are cheap, if not a really great
 performers. I'd recommend one of these if you want an SSD for rpool, or
 if you need a large L2ARC for dedup (or similar) and can't afford
 anything in the X25-M price range.  You should also be OK with a Corsair
 Reactor in this performance category.
 
 
 What about if you want to get one that you can use for both the rpool,
 and ZIL (for another data pool?)
 What if you want one for all 3 (rpool, ZIL, L2ARC)??
 
 -Kyle
 
 
 It all boils down to performance and the tradeoffs you are willing to
 make.  For good ZIL, you want something that has a very high IOPS rating
 (50,000+ if possible, 10,000+ minimum, particularly when writing small
 chunks).

High write IOPS :-)

 For L2ARC, you are more concerned with total size/capacity, and
 modest IOPS (3000-1 IOPS, or the ability to write at least 100Mb/s
 at 4-8k write sizes, plus as high as possible read I/O).

The L2ARC fill rate is throttled to 16 MB/sec at boot and 8 MB/sec later.
Many SSDs work well as L2ARC cache devices.

 For rpool use,
 you don't really care about performance so much, as it's almost
 exclusively read-only

Yep

 (one should generally not configure a swap device
 on an SSD-based rpool).

Disagree.  Swap is a perfectly fine workload for SSDs.  Under ZFS, 
even more so.  I'd really like to squash this rumour and thought we 
were making progress on that front :-(  Today, there are millions or 
thousands of systems with deployed SSDs as boot and swap on a
wide variety of OSes.  Go for it.

 You could probably live with an X25-M as something to use for all three,
 but of course you're making tradeoffs all over the place.

That would be better than almost any HDD on the planet because
the HDD tradeoffs result in much worse performance.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD sale on newegg

2010-04-06 Thread Carson Gaspar

Erik Trimble wrote:
On Tue, 2010-04-06 at 08:26 -0700, Anil wrote: 

Seems a nice sale on Newegg for SSD devices. Talk about choices. What's the 
latest recommendations for a log device?

http://bit.ly/aL1dne


The Vertex LE models should do well as ZIL  (though not as well as an
X25-E or a Zeus) for all non-enterprise users.


I just found an 8 GB SATA Zeus (Z4S28I) for £83.35 (~US$127) shipped to 
California. That should be more than large enough for my ZIL @home, 
based on zilstat.


The web site says EOL, limited to current stock.

http://www.dpieshop.com/stec-zeus-z4s28i-8gb-25-sata-ssd-solid-state-drive-industrial-temp-p-410.html

Of course this seems _way_ too good to be true, but I decided to take 
the risk.


--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD sale on newegg

2010-04-06 Thread Erik Trimble
On Tue, 2010-04-06 at 17:17 -0700, Richard Elling wrote:
 On Apr 6, 2010, at 5:00 PM, Erik Trimble wrote:

[snip]

  For L2ARC, you are more concerned with total size/capacity, and
  modest IOPS (3000-1 IOPS, or the ability to write at least 100Mb/s
  at 4-8k write sizes, plus as high as possible read I/O).
 
 The L2ARC fill rate is throttled to 16 MB/sec at boot and 8 MB/sec later.
 Many SSDs work well as L2ARC cache devices.
 

Where is that limit set? That's completely new to me. :-(

In any case, L2ARC devices should probably have at least reasonable
write performance for small sizes, given the propensity to put things
like the DDT and other table structures/metadata into it, all of which
is small write chunks. I tried one of the old JMicron-based 1st-gen SSDs
as an L2ARC, and it wasn't much of a success.

Fast read speed is good for an L2ARC, but that's not generally a problem
with even the cheap SSDs these days.


  (one should generally not configure a swap device
  on an SSD-based rpool).
 
 Disagree.  Swap is a perfectly fine workload for SSDs.  Under ZFS, 
 even more so.  I'd really like to squash this rumour and thought we 
 were making progress on that front :-(  Today, there are millions or 
 thousands of systems with deployed SSDs as boot and swap on a
 wide variety of OSes.  Go for it.

Really?  I'm generally not good for running swap on lower-performing
SSDs over here in Java-land, but that may have to do with my specific
workload.  I'll take your word for it (of course, I'm voting for swap
not being necessary on many machines these days).



  You could probably live with an X25-M as something to use for all three,
  but of course you're making tradeoffs all over the place.
 
 That would be better than almost any HDD on the planet because
 the HDD tradeoffs result in much worse performance.
  -- richard
 

True. Viva la SSD!




-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD sale on newegg

2010-04-06 Thread Bill Sommerfeld

On 04/06/10 17:17, Richard Elling wrote:

You could probably live with an X25-M as something to use for all three,
but of course you're making tradeoffs all over the place.


That would be better than almost any HDD on the planet because
the HDD tradeoffs result in much worse performance.


Indeed.  I've set up a couple small systems (one a desktop workstation, 
and the other a home fileserver) with root pool plus the l2arc and slog 
for a data pool on an 80G X25-M and have been very happy with the result.


The recipe I'm using is to slice the ssd, with the rpool in s0 with 
roughly half the space, 1GB in s3 for slog, and the rest of the space as 
L2ARC in s4.  That may actually be overly generous for the root pool, 
but I run with copies=2 on rpool/ROOT and I tend to keep a bunch of BE's 
around.


- Bill


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] To slice, or not to slice

2010-04-06 Thread Edward Ned Harvey
 I have reason to believe that both the drive, and the OS are correct.
 I have suspicion that the HBA simply handled the creation of this
 volume somehow differently than how it handled the original.  Don't
 know the answer for sure yet.

Ok, that's confirmed now.  Apparently when the drives ship from the factory,
they're pre-initialized for the HBA, so the HBA happily imports them and
creates simple volume (aka jbod) using the factory initialization.
Unfortunately, the factory init includes HBA metadata at both the start and
end of the drive ... so I lose 1MB.

The fix to the problem is to initialize the disk again with the HBA, and
then create a new simple volume.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-06 Thread Edward Ned Harvey
  We ran into something similar with these drives in an X4170 that
 turned
  out to
  be  an issue of the preconfigured logical volumes on the drives. Once
  we made
  sure all of our Sun PCI HBAs where running the exact same version of
  firmware
  and recreated the volumes on new drives arriving from Sun we got back
  into sync
  on the X25-E devices sizes.
 
 Can you elaborate?  Just today, we got the replacement drive that has
 precisely the right version of firmware and everything.  Still, when we
 plugged in that drive, and create simple volume in the storagetek
 raid utility, the new drive is 0.001 Gb smaller than the old drive.
 I'm still hosed.
 
 Are you saying I might benefit by sticking the SSD into some laptop,
 and zero'ing the disk?  And then attach to the sun server?
 
 Are you saying I might benefit by finding some other way to make the
 drive available, instead of using the storagetek raid utility?
 
 Thanks for the suggestions...

Sorry for the double post.  Since the wrong-sized drive was discussed in two
separate threads, I want to stick a link here to the other one, where the
question was answered.  Just incase anyone comes across this discussion by
search or whatever...

http://mail.opensolaris.org/pipermail/zfs-discuss/2010-April/039669.html

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD sale on newegg

2010-04-06 Thread Richard Elling

On Apr 6, 2010, at 5:38 PM, Erik Trimble wrote:

 On Tue, 2010-04-06 at 17:17 -0700, Richard Elling wrote:
 On Apr 6, 2010, at 5:00 PM, Erik Trimble wrote:
 
 [snip]
 
 For L2ARC, you are more concerned with total size/capacity, and
 modest IOPS (3000-1 IOPS, or the ability to write at least 100Mb/s
 at 4-8k write sizes, plus as high as possible read I/O).
 
 The L2ARC fill rate is throttled to 16 MB/sec at boot and 8 MB/sec later.
 Many SSDs work well as L2ARC cache devices.
 
 
 Where is that limit set? That's completely new to me. :-(

L2ARC_WRITE_SIZE (8MB) is the default size of data to be written and 
L2ARC_FEED_SECS (1) is the interval.  When arc_warm is FALSE, the
L2ARC_WRITE_SIZE is doubled (16MB). Look somewhere around
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#553

This change was made per CR 6709301, An empty L2ARC cache device is slow to 
warm up
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6709301

I'll agree the feed rate is somewhat arbitrary, but probably suits many 
use cases.

 In any case, L2ARC devices should probably have at least reasonable
 write performance for small sizes, given the propensity to put things
 like the DDT and other table structures/metadata into it, all of which
 is small write chunks. I tried one of the old JMicron-based 1st-gen SSDs
 as an L2ARC, and it wasn't much of a success.

I haven't done many L2ARC measurements, but I suspect the writes are large.

 Fast read speed is good for an L2ARC, but that's not generally a problem
 with even the cheap SSDs these days.

yep.

 (one should generally not configure a swap device
 on an SSD-based rpool).
 
 Disagree.  Swap is a perfectly fine workload for SSDs.  Under ZFS, 
 even more so.  I'd really like to squash this rumour and thought we 
 were making progress on that front :-(  Today, there are millions or 
 thousands of systems with deployed SSDs as boot and swap on a
 wide variety of OSes.  Go for it.
 
 Really?  I'm generally not good for running swap on lower-performing
 SSDs over here in Java-land, but that may have to do with my specific
 workload.  I'll take your word for it (of course, I'm voting for swap
 not being necessary on many machines these days).

If you have to swap, you have no performance.  But people with SSDs
(eg MacBook Air) seem happy to see fewer spinning beach balls :-)
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD sale on newegg

2010-04-06 Thread Daniel Carosone
On Tue, Apr 06, 2010 at 06:53:04PM -0700, Richard Elling wrote:
  Disagree.  Swap is a perfectly fine workload for SSDs.  Under ZFS, 
  even more so.  I'd really like to squash this rumour and thought we 
  were making progress on that front :-(  Today, there are millions or 
  thousands of systems with deployed SSDs as boot and swap on a
  wide variety of OSes.  Go for it.

+1

  Really?  I'm generally not good for running swap on lower-performing
  SSDs over here in Java-land, but that may have to do with my specific
  workload.  I'll take your word for it (of course, I'm voting for swap
  not being necessary on many machines these days).
 
 If you have to swap, you have no performance.

Disagree.  If you're thrashing heavily, yes.  An SSD will make a
difference in swap latency up until that point, but that won't help
much when everything's stuck short for memory. 

However, a lot can happen before that point.  Swapping out unused
stuff (including idle services/processes and old tmpfs pages) can be
very useful for performance, making room for the performance-sensitive
working set.  Some of your lower-priority processes can page in and
out faster with an ssd, and smoothe the curve from memory pressure to
total gridlock.

Finally, this middle ground is where ssd root also helps, because
executable text is paged from there. 

--
Dan.



pgpNR9csTjG5h.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?

2010-04-06 Thread Eric D. Mudama

On Tue, Apr  6 at 17:56, Markus Kovero wrote:

Our Dell T610 is and has been working just fine for the last year and
a half, without a single network problem.  Do you know if they're
using the same integrated part?



--eric


Hi, as I should have mentioned, integrated nics that cause issues
are using Broadcom BCM5709 chipset and these connectivity issues
have been quite widespread amongst linux people too, Redhat tries to
fix this; http://kbase.redhat.com/faq/docs/DOC-26837 but I believe
it's messed up in firmware somehow, as in our tests show
4.6.8-series firmware seems to be more stable.

And what comes to workarounds, disabling msi is bad if it creates
latency for network/disk controllers and disabling c-states from
Nehalem processors is just stupid (having no turbo, power saving
etc).

Definitely no go for storage imo.


Seems like this issue only occurs when MSI-X interrupts are enabled
for the BCM5709 chips, or am I reading it wrong?

If I type 'echo ::interrupts | mdb -k', and isolate for
network-related bits, I get the following output:


 IRQ  Vect IPL Bus   Trg Type   CPU Share APIC/INT# ISR(s)
 36   0x60 6   PCI   Lvl Fixed  3   1 0x1/0x4   bnx_intr_1lvl
 48   0x61 6   PCI   Lvl Fixed  2   1 0x1/0x10  bnx_intr_1lvl


Does this imply that my system is not in a vulnerable configuration?
Supposedly i'm losing some performance without MSI-X, but I'm not sure
in which environments or workloads we would notice since the load on
this server is relatively low, and the L2ARC serves data at greater
than 100MB/s (wire speed) without stressing much of anything.

The BIOS settings in our T610 are exactly as they arrived from Dell
when we bought it over a year ago.

Thoughts?
--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS getting slower over time

2010-04-06 Thread Marcus Wilhelmsson
The dips are gone. I've run simple copy operations via CIFS for two days and 
the problem hasn't appeared anymore.

I'll try to find out what caused it though, thanks for trying to help me.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss