Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread James Lever


On 25/06/2009, at 5:16 AM, Miles Nordin wrote:


and mpt is the 1068 driver, proprietary, works on x86 and SPARC.



then there is also itmpt, the third-party-downloadable closed-source
driver from LSI Logic, dunno much about it but someone here used it.


I'm confused.  Why do you say the mpt driver is proprietary and the  
LSI provided tool is closed source?


I thought they were both closed source and that the LSI chipset  
specifications were proprietary.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 resilvering spare taking forever?

2009-06-25 Thread Joe Kearney
 Yep, it also suffers from the bug that restarts
 resilvers when you take a
 snapshot. This was fixed in b94.
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bu
 g_id=6343667
  -- richard

Hats off to Richard for saving the day.  This was exactly the issue.  I shut 
off my automatic snapshots and 3 days later my resilver is done.

Joe
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread Carson Gaspar

Miles Nordin wrote:


There's also been talk of two tools, MegaCli and lsiutil, which are
both binary only and exist for both Linux and Solaris, and I think are
used only with the 1078 cards but maybe not.


lsiutil works with LSI chips that use the Fusion-MPT interface (SCSI, 
SAS, and FC), including the 1068. I've used it with both the mpt and 
itmpt driver.


MegaCLI appears to be for MegaRAID SAS and SATA II controllers (using 
the mega_sas driver), including the 1078. I've never used it.


--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-25 Thread Ben
Thanks very much everyone.

Victor, I did think about using VirtualBox, but I have a real machine and a 
supply of hard drives for a short time, for I'll test it out using that if I 
can.

Scott, of course, at work we use three mirrors and it works very well, has 
saved us on occasion where we have detached the third mirror, upgraded, found 
the upgrade failed and have been able to revert from the third mirror instead 
of having to go through backups.

George, it will be great to see the 'autoexpand' in the next release.  I'm 
keeping my home server on stable releases for the time being :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpoll status -x output

2009-06-25 Thread Tomasz Kłoczko
 It might be easier to look for the pool status thusly
 zpool get health poolname

Correct me if I'm wrong but zpool get is available only in some latest
versions of OS and Solaris 10 (we are using on some boxes some older
versions of Solaris 10).

Nevertheless IMO zpoll status -x should work as it is described in
manual and current behavior does not matches to description in manual :)

Tomasz



--
Wydział Zarządzania i Ekonomii 
Politechnika Gdańska
http://www.zie.pg.gda.pl/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] auto snapshots 0.12

2009-06-25 Thread Tim Foster
Hi all,

Just a quick plug: the latest version of ZFS Automatic Snapshots SMF
service hit the hg repository yesterday.

If you're using 0.11 or older, it's well worth upgrading to get the few
bugfixes (especially if you're using CIFs - we use '_' instead of ':' in
snapshot names now)

More at:
http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_0_12

cheers,
tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] how to convert zio-io_offset to disk block number?

2009-06-25 Thread zhihui Chen
I use following dtrace script to trace the postion of one file on zfs:

#!/usr/sbin/dtrace -qs
zio_done:entry
/((zio_t *)(arg0))-io_vd/
{
zio=(zio_t *)arg0;
printf(Offset:%x and Size:%x\n,zio-io_offset,zio-io_size);
printf(vd:%x\n,(unsigned long)(zio-io_vd));
printf(process name:%s\n,execname);
tracemem(zio-io_data,40);
stack();
}

and I run dd command:  dd if=/export/dsk1/test1 bs=512 count=1, the dtrace
script will generate following it output:

Offset:657800 and Size:200
vd:ff02d6a1a700
process name:sched
  
  zfs`zio_execute+0xa0
  genunix`taskq_thread+0x193
  unix`thread_start+0x8
^C

The tracemem output is the right context of file test1, which is a 512-byte
text file. zpool status has following output:

pool: tpool
state: ONLINE
scrub: none requested
config:
NAMESTATE READ WRITE CKSUM
tpool   ONLINE   0 0 0
  c2t0d0ONLINE   0 0 0
errors: No known data errors

My question is how to translate zio-io_offset (0x657800, equal to decimal
number 6649856) outputed by dtace to block number on disk c2t0d0?
I tried to use dd if=/dev/dsk/c2t0d0 of=text iseek=6650112 bs=512 count=1
for a check,but the result is not right.

Thanks
Zhihui
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-25 Thread Ross
 I am not sure how zfs would know the rate of the
 underlying disk storage 

Easy:  Is the buffer growing?  :-)

If the amount of data in the buffer is growing, you need to throttle back a bit 
until the disks catch up.  Don't stop writes until the buffer is empty, just 
slow them down to match the rate at which you're clearing data from the buffer.

In your case I'd expect to see ZFS buffer the early part of the write (so you'd 
see a very quick initial burst), but from then on you would want a continual 
stream of data to disk, at a steady rate.

To the client it should respond just like storing to disk, the only difference 
is there's actually a small delay before the data hits the disk, which will be 
proportional to the buffer size.  ZFS won't have so much opportunity to 
optimize writes, but you wouldn't get such stuttering performance.

However, reading through the other messages, if it's a known bug and ZFS 
blocking reads while writing, there may not be any need for this idea.  But 
then, that bug has been open since 2006, is flagged as fix in progress, and was 
planned for snv_51 o_0.  So it probably is worth having this discussion.

And I may be completely wrong here, but reading that bug, it sounds like ZFS 
issues a whole bunch of writes at once as it clears the buffer, which ties in 
with the experiences of stalling actually being caused by reads being blocked.

I'm guessing given ZFS's aims it made sense to code it that way - if you're 
going to queue a bunch of transactions to make them efficient on disk, you 
don't want to interrupt that batch with a bunch of other (less efficient) 
reads. 

But the unintended side effect of this is that ZFS's attempt to optimize writes 
will causes jerky read and write behaviour any time you have a large amount of 
writes going on, and when you should be pushing the disks to 100% usage you're 
never going to reach that as it's always going to have 5s of inactivity, 
followed by 5s of running the disks flat out.

In fact, I wonder if it's a simple as the disks ending up doing 5s of reads, a 
delay for processing, 5s of writes, 5s of reads, etc...

It's probably efficient, but it's going to *feel* horrible, a 5s delay is 
easily noticeable by the end user, and is a deal breaker for many applications.

In situations like that, 5s is a *huge* amount of time, especially so if you're 
writing to a disk or storage device which has it's own caching!  Might it be 
possible to keep the 5s buffer for ordering transactions, but then commit that 
as a larger number of small transactions instead of one huge one?

The number of transactions could even be based on how busy the system is - if 
there are a lot of reads coming in, I'd be quite happy to split that into 50 
transactions.  On 10GbE, 5s is potentially 6.25GB of data.  Even split into 50 
transactions you're writing 128MB at a time, and that sounds plenty big enough 
to me!

Either way, something needs to be done.  If we move to ZFS our users are not 
going to be impressed with 5s delays on the storage system.

Finally, I do have one question for the ZFS guys:  How does the L2ARC interact 
with this?  Are reads from the L2ARC blocked, or will they happen in parallel 
with the writes to the main storage?  I suspect that a large L2ARC (potentially 
made up of SSD disks) would eliminate this problem the majority of the time.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] auto snapshots 0.12

2009-06-25 Thread Ross
Thanks Tim, do you know which build this is going to appear in?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is the PROPERTY compression will increase the ZFS I/O throughput?

2009-06-25 Thread Chookiex
thank you ;)
I mean that it would be faster in reading compressed data IF the write with 
compression is faster than non-compressed? Just like lzjb.

But i can't understand why the read performance is generally unaffected by 
compression? Because the uncompression (lzjb, gzip)  is faster than compression 
in algorithm, so I think reading the compressing data would need more less CPU 
time.

So the conclusion in the blog that read performance is generally unaffected by 
compression, I'm not agreed with it.

Except the ARC cached the data in the read test and there are no random read 
test?

My data is text data set, about 320,000 text files or emails. The compression 
ratio is:
lzjb 1.55x
gzip-1 2.54x
gzip-2 2.58x
gzip 2.72x
gzip-9 2.73x

for your curiosity :)





From: David Pacheco david.pach...@sun.com
To: Chookiex hexcoo...@yahoo.com
Cc: zfs-discuss@opensolaris.org
Sent: Thursday, June 25, 2009 2:00:49 AM
Subject: Re: [zfs-discuss] Is the PROPERTY compression will increase the ZFS 
I/O throughput?

Chookiex wrote:
 Thank you for your reply.
 I had read the blog. The most interesting thing is WHY is there no 
 performance improve when it set any compression?

There are many potential reasons, so I'd first try to identify what your 
current bandwidth limiter is. If you're running out of CPU on your current 
workload, for example, adding compression is not going to help performance. If 
this is over a network, you could be saturating the link.. Or you might not 
have enough threads to drive the system to bandwidth.

Compression will only help performance if you've got plenty of CPU and other 
resources but you're out of disk bandwidth. But even if that's the case, it's 
possible that compression doesn't save enough space that you actually decrease 
the number of disk I/Os that need to be done.

 The compressed read I/O is less than uncompressed data,  and decompress is 
 faster than compress.

Out of curiosity, what's the compression ratio?

-- Dave

 so if lzjb write is better than non-compressed, the lzjb read would be better 
 than write?
  Is the ARC or L2ARC do any tricks?
  Thanks
 
 
 *From:* David Pacheco david.pach...@sun.com
 *To:* Chookiex hexcoo...@yahoo.com
 *Cc:* zfs-disc...@opensolaris..org
 *Sent:* Wednesday, June 24, 2009 4:53:37 AM
 *Subject:* Re: [zfs-discuss] Is the PROPERTY compression will increase the 
 ZFS I/O throughput?
 
 Chookiex wrote:
   Hi all.
  
   Because the property compression could decrease the file size, and the 
file IO will be decreased also.
   So, would it increase the ZFS I/O throughput with compression?
  
   for example:
   I turn on gzip-9,on a server with 2*4core Xeon, 8GB RAM.
   It could compress my files with compressratio 2.5x+. could it be?
   or I turn on lzjb, about 1.5x with the same files.
 
 It's possible, but it depends on a lot of factors, including what your 
 bottleneck is to begin with, how compressible your data is, and how hard you 
 want the system to work compressing it. With gzip-9, I'd be shocked if you 
 saw bandwidth improved. It seems more common with lzjb:
 
 http://blogs.sun.com/dap/entry/zfs_compression
 
 (skip down to the results)
 
 -- Dave
 
  
   could it be? Is there anyone have a idea?
  
   thanks
  
   
  
   ___
   zfs-discuss mailing list
   zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
   http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 -- David Pacheco, Sun Microsystems Fishworks.    http://blogs.sun.com/dap/
 


-- David Pacheco, Sun Microsystems Fishworks.    http://blogs.sun.com/dap/



  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] auto snapshots 0.12

2009-06-25 Thread Tim Foster
Hi Ross,

On Thu, 2009-06-25 at 04:24 -0700, Ross wrote:
 Thanks Tim, do you know which build this is going to appear in?

I've actually no idea - SUNWzfs-auto-snapshot gets delivered by the
Desktop consolidation, not me. I'm checking in with them to see what the
story is.

That said, it probably makes sense to wait till a build is available on
pkg.opensolaris.org that includes the 'zfs list -d' support and I get a
chance to do the (tiny bit of) work to start using that in the method
script and push 0.12.1.

- but any testing you feel like doing between now and then would be most
welcome :-)

cheers,
tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-25 Thread Bob Friesenhahn

On Wed, 24 Jun 2009, Lejun Zhu wrote:


There is a bug in the database about reads blocked by writes which may be 
related:

http://bugs.opensolaris.org/view_bug.do?bug_id=6471212

The symptom is sometimes reducing queue depth makes read perform better.


This one certainly sounds promising.  Since Matt Ahrens has been 
working on it for almost a year, it must be almost fixed by now. :-)


I am not sure how is queue depth is managed, but it seems possible to 
detect when reads are blocked by bulk writes and make some automatic 
adjustments to improve balance.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Regular panics: BAD TRAP: type=e

2009-06-25 Thread Anton Lundin
Im having the same problems. 

aprox. every 1-9 hours it crashes and the backtrace is exactly the same as 
posted here.

the machine ran b98 rock-solid for a long time...

Anyone have a clue where to start?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-25 Thread Scott Meilicke
 if those servers are on physical boxes right now i'd do some perfmon
 caps and add up the iops.

Using perfmon to get a sense of what is required is a good idea. Use the 95 
percentile to be conservative. The counters I have used are in the Physical 
disk object. Don't ignore the latency counters either. In my book, anything 
consistently over 20ms or so is excessive.

I run 30+ VMs on an Equallogic array with 14 sata disks, broken up as two 
striped 6 disk raid5 sets (raid 50) with 2 hot spares. That array is, on 
average, about 25% loaded from an IO stand point. Obviously my VMs are pretty 
light. And the EQL gear is *fast*, which makes me feel better about spending 
all of that money :).

 Regarding ZIL usage, from what I have read you will only see 
 benefits if you are using NFS backed storage, but that it can be 
 significant.

 link?

From the ZFS Evil Tuning Guide 
(http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide):
ZIL stands for ZFS Intent Log. It is used during synchronous writes 
operations.

further down:

If you've noticed terrible NFS or database performance on SAN storage array, 
the problem is not with ZFS, but with the way the disk drivers interact with 
the storage devices.
ZFS is designed to work with storage devices that manage a disk-level cache. 
ZFS commonly asks the storage device to ensure that data is safely placed on 
stable storage by requesting a cache flush. For JBOD storage, this works as 
designed and without problems. For many NVRAM-based storage arrays, a problem 
might come up if the array takes the cache flush request and actually does 
something rather than ignoring it. Some storage will flush their caches despite 
the fact that the NVRAM protection makes those caches as good as stable storage.
ZFS issues infrequent flushes (every 5 second or so) after the uberblock 
updates. The problem here is fairly inconsequential. No tuning is warranted 
here.
ZFS also issues a flush every time an application requests a synchronous write 
(O_DSYNC, fsync, NFS commit, and so on). The completion of this type of flush 
is waited upon by the application and impacts performance. Greatly so, in fact. 
From a performance standpoint, this neutralizes the benefits of having an 
NVRAM-based storage.

When I was testing iSCSI vs. NFS, it was clear iSCSI was not doing sync, NFS 
was. Here are some zpool iostat numbers:

iSCSI testing using iometer with the RealLife work load (65% read, 60% random, 
8k transfers - see the link in my previous post) - it is clear that writes are 
being cached in RAM, and then spun off to disk.

# zpool iostat data01 1

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data01  55.5G  20.4T691  0  4.21M  0
data01  55.5G  20.4T632  0  3.80M  0
data01  55.5G  20.4T657  0  3.93M  0
data01  55.5G  20.4T669  0  4.12M  0
data01  55.5G  20.4T689  0  4.09M  0
data01  55.5G  20.4T488  1.77K  2.94M  9.56M
data01  55.5G  20.4T 29  4.28K   176K  23.5M
data01  55.5G  20.4T 25  4.26K   165K  23.7M
data01  55.5G  20.4T 20  3.97K   133K  22.0M
data01  55.6G  20.4T170  2.26K  1.01M  11.8M
data01  55.6G  20.4T678  0  4.05M  0
data01  55.6G  20.4T625  0  3.74M  0
data01  55.6G  20.4T685  0  4.17M  0
data01  55.6G  20.4T690  0  4.04M  0
data01  55.6G  20.4T679  0  4.02M  0
data01  55.6G  20.4T664  0  4.03M  0
data01  55.6G  20.4T699  0  4.27M  0
data01  55.6G  20.4T423  1.73K  2.66M  9.32M
data01  55.6G  20.4T 26  3.97K   151K  21.8M
data01  55.6G  20.4T 34  4.23K   223K  23.2M
data01  55.6G  20.4T 13  4.37K  87.1K  23.9M
data01  55.6G  20.4T 21  3.33K   136K  18.6M
data01  55.6G  20.4T468496  2.89M  1.82M
data01  55.6G  20.4T687  0  4.13M  0

Testing against NFS shows writes to disk continuously.

NFS Testing
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data01  59.6G  20.4T 57216   352K  1.74M
data01  59.6G  20.4T 41 21   660K  2.74M
data01  59.6G  20.4T 44 24   655K  3.09M
data01  59.6G  20.4T 41 23   598K  2.97M
data01  59.6G  20.4T 34 33   552K  4.21M
data01  59.6G  20.4T 46 24   757K  3.09M
data01  59.6G  20.4T 39 24   593K  3.09M
data01  59.6G  20.4T 45 25   687K  3.22M
data01  59.6G  20.4T 45 23   683K  2.97M
data01  59.6G  20.4T 33 23   492K  2.97M
data01  59.6G  20.4T 16 41   214K  1.71M
data01  59.6G  20.4T  3  2.36K  53.4K  30.4M
data01  59.6G  20.4T  1  2.23K  20.3K  29.2M
data01  

Re: [zfs-discuss] [storage-discuss] Backups

2009-06-25 Thread Greg
I think I am getting closer to ideas as to how to back this up. I will do as 
you said to backup the os, take an image or something of that nature. I will 
take a full backup every one to three months of the virtual machines, however 
the data that the vm is working with will be mounted seperately so that if the 
virtual machine goes down all that is needed is to restore the last backup of 
the vm and mount the storage and we should be up and running. Now my only worry 
is how to backup data that the vm's are accessing. I guess my question is this: 
Say I take a full backup every x amount of days, say 7 so weekly backups. I 
then take snapshots throughout the week. Then something happens and there is a 
flood or something. Once I have all hardware and that side of things going, can 
I restore from that full backup and then apply the snapshots to it. Will I then 
be up to yesterday backup wise or are those snapshots useless and I am up to 
last week.

Thanks for helping!
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-25 Thread Bob Friesenhahn

On Thu, 25 Jun 2009, Ross wrote:

But the unintended side effect of this is that ZFS's attempt to 
optimize writes will causes jerky read and write behaviour any time 
you have a large amount of writes going on, and when you should be 
pushing the disks to 100% usage you're never going to reach that as 
it's always going to have 5s of inactivity, followed by 5s of 
running the disks flat out.


In fact, I wonder if it's a simple as the disks ending up doing 5s 
of reads, a delay for processing, 5s of writes, 5s of reads, etc...


It's probably efficient, but it's going to *feel* horrible, a 5s 
delay is easily noticeable by the end user, and is a deal breaker 
for many applications.


Yes, 5 seconds is a long time.  For an application which mixes 
computation with I/O it is not really acceptable for read I/O to go 
away for up to 5 seconds.  This represents time that the CPU is not 
being used, and a time that the application may be unresponsive to the 
user.  When compression is used the impact is different, but the 
compression itself consumes considerable CPU (and quite abruptly) so 
that other applications (e.g. X11) stop responding during the 
compress/write cycle.


The read problem is one of congestion.  If I/O is congested with 
massive writes, then reads don't work.  It does not really matter how 
fast your storage system is.  If the 5 seconds of buffered writes are 
larger than what the device driver and storage system buffering allows 
for, then the I/O channel will be congested.


As an example, my storage array is demonstrated to be able to write 
359MB/second but ZFS will blast data from memory as fast as it can, 
and the storage path can not effectively absorb 1.8GB (359*5) of data 
since the StorageTek 2500's internal buffers are much smaller than 
that, and fiber channel device drivers are not allowed to consume much 
memory either.  To make matters worse, I am using ZFS mirrors so the 
amount of data written to the array in those five seconds is doubled 
to 3.6GB.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best controller card for 8 SATA drives ?

2009-06-25 Thread Eric D. Mudama

On Wed, Jun 24 at 18:43, Bob Friesenhahn wrote:

On Wed, 24 Jun 2009, Eric D. Mudama wrote:


The main purpose for using SSDs with ZFS is to reduce latencies for  
synchronous writes required by network file service and databases.


In the available 5 months ago category, the Intel X25-E will write
sequentially at ~170MB/s according to the datasheets.  That is faster
than most, if not all rotating media today.


Sounds good.  Is that is after the whole device has been re-written a  
few times or just when you first use it?


Based on the various review sites, some tests experience a temporary
performance decrease when performing sequential IO over the top of
previously randomly written data, which resolves in some short time
period.

I am not convinced that simply writing the devices makes them slower.

Actual performance will be workload specific, YMMV.


How many of these devices do you own and use?


I own two of them personally, and work with many every day.


Seagate Cheetah drives can now support a sustained data rate of
204MB/second.  That is with 600GB capacity rather than 64GB and at a
similar price point (i.e. 10X less cost per GB).  Or you can just
RAID-0 a few cheaper rotating rust drives and achieve a huge
sequential data rate.


True.  In $ per sequential GB/s, rotating rust still wins by far.
However, your comment about all flash being slower than rotating at
sequential writes was mistaken.  Even at 10x the price, if you're
working with a dataset that needs random IO, the $ per IOP from flash
can be significantly greater than any amount of rust, and typically
with much lower power consumption to boot.

Obviously the primary benefits of SSDs aren't in sequential
reads/writes, but they're not necessarilly complete dogs there either.

--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is the PROPERTY compression will increase the ZFS I/O throughput?

2009-06-25 Thread David Pacheco

Chookiex wrote:

thank you ;)
I mean that it would be faster in reading compressed data IF the write 
with compression is faster than non-compressed? Just like lzjb.



Do you mean that it would be faster to read compressed data than 
uncompressed data, or it would be faster to read compressed data than to 
write it?



But i can't understand why the read performance is generally unaffected 
by compression? Because the uncompression (lzjb, gzip)  is faster than 
compression in algorithm, so I think reading the compressing data would 
need more less CPU time.


So the conclusion in the blog that read performance is generally 
unaffected by compression, I'm not agreed with it.
Except the ARC cached the data in the read test and there are no random 
read test?



My comment was just an empirical observation: in my experiments, read 
time was basically unaffected. I don't believe this was a result of ARC 
caching because I constructed the experiments to avoid that altogether 
by using working sets larger than the ARC and streaming through the data.


In my case the system's read bandwidth wasn't a performance limiter. We 
know this because the write bandwidth was much higher (see the graphs), 
and we were writing twice as much data as we were reading (because we 
were mirroring). So even if compression was decreasing the amount of I/O 
that was done on the read side, other factors (possibly the number of 
clients) limited the bandwidth we could achieve before we got to a point 
where compression would have made any difference.


-- Dave


My data is text data set, about 320,000 text files or emails. The 
compression ratio is:

lzjb 1.55x
gzip-1 2.54x
gzip-2 2.58x
gzip 2.72x
gzip-9 2.73x

for your curiosity :)



*From:* David Pacheco david.pach...@sun.com
*To:* Chookiex hexcoo...@yahoo.com
*Cc:* zfs-discuss@opensolaris.org
*Sent:* Thursday, June 25, 2009 2:00:49 AM
*Subject:* Re: [zfs-discuss] Is the PROPERTY compression will increase 
the ZFS I/O throughput?


Chookiex wrote:
  Thank you for your reply.
  I had read the blog. The most interesting thing is WHY is there no 
performance improve when it set any compression?


There are many potential reasons, so I'd first try to identify what your 
current bandwidth limiter is. If you're running out of CPU on your 
current workload, for example, adding compression is not going to help 
performance. If this is over a network, you could be saturating the 
link. Or you might not have enough threads to drive the system to bandwidth.


Compression will only help performance if you've got plenty of CPU and 
other resources but you're out of disk bandwidth. But even if that's the 
case, it's possible that compression doesn't save enough space that you 
actually decrease the number of disk I/Os that need to be done.


  The compressed read I/O is less than uncompressed data,  and 
decompress is faster than compress.


Out of curiosity, what's the compression ratio?

-- Dave

  so if lzjb write is better than non-compressed, the lzjb read would 
be better than write?

   Is the ARC or L2ARC do any tricks?
   Thanks
 
  
  *From:* David Pacheco david.pach...@sun.com 
mailto:david.pach...@sun.com

  *To:* Chookiex hexcoo...@yahoo.com mailto:hexcoo...@yahoo.com
  *Cc:* zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
  *Sent:* Wednesday, June 24, 2009 4:53:37 AM
  *Subject:* Re: [zfs-discuss] Is the PROPERTY compression will 
increase the ZFS I/O throughput?

 
  Chookiex wrote:
Hi all.
   
Because the property compression could decrease the file size, and 
the file IO will be decreased also.

So, would it increase the ZFS I/O throughput with compression?
   
for example:
I turn on gzip-9,on a server with 2*4core Xeon, 8GB RAM.
It could compress my files with compressratio 2.5x+. could it be?
or I turn on lzjb, about 1.5x with the same files.
 
  It's possible, but it depends on a lot of factors, including what 
your bottleneck is to begin with, how compressible your data is, and how 
hard you want the system to work compressing it. With gzip-9, I'd be 
shocked if you saw bandwidth improved. It seems more common with lzjb:

 
  http://blogs.sun.com/dap/entry/zfs_compression
 
  (skip down to the results)
 
  -- Dave
 
   
could it be? Is there anyone have a idea?
   
thanks
   



   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org 
mailto:zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org

http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
  -- David Pacheco, Sun Microsystems Fishworks.
http://blogs.sun.com/dap/

 


-- David Pacheco, Sun Microsystems Fishworks.

[zfs-discuss] unable to import zfs pool

2009-06-25 Thread Ketan
Hi , I had a zfs pool which i exported before our SAN maintenance  and 
powerpath upgrade but now after the powerpath upgrade and maintenance i 'm 
unable to import the pool it give following errors 

 # zpool import
  pool: emcpool1
id: 5596268873059055768
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

emcpool1  UNAVAIL  insufficient replicas
  emcpower0c  UNAVAIL  cannot open
 # zpool import -f emcpool1
cannot import 'emcpool1': invalid vdev configuration

any idea what could be the reason for this ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to import zfs pool

2009-06-25 Thread Daniel J. Priem
could it be possible that your path changed?
just do format  CTRL+D
and look if emcpower0c is now located somewhere else.

regards
daniel
Ketan no-re...@opensolaris.org writes:

 Hi , I had a zfs pool which i exported before our SAN maintenance  
 and powerpath upgrade but now after the powerpath upgrade and
 maintenance 
 i 'm unable to import the pool it give following errors 

  # zpool import
   pool: emcpool1
 id: 5596268873059055768
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
 devices and try again.
see: http://www.sun.com/msg/ZFS-8000-3Cconfig:

 emcpool1  UNAVAIL  insufficient replicas
   emcpower0c  UNAVAIL  cannot open
  # zpool import -f emcpool1
 cannot import 'emcpool1': invalid vdev configuration

 any idea what could be the reason for this ?

-- 
disy Informationssysteme GmbH
Daniel Priem
Netzwerk- und Systemadministrator
Tel: +49 721 1 600 6000, Fax: -605, E-Mail: daniel.pr...@disy.net

Entdecken Sie Lösungen mit Köpfchen
auf unserer neuen Website: www.disy.net

Firmensitz: Erbprinzenstr. 4-12, 76133 Karlsruhe
Registergericht: Amtsgericht Mannheim, HRB 107964
Geschäftsführer: Claus Hofmann

-
Environment . Reporting . GIS
-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to import zfs pool

2009-06-25 Thread Ketan
no idea path changed or not .. but following is output from my format .. and 
nothing has changed 

AVAILABLE DISK SELECTIONS:
   0. c1t0d0 SUN146G cyl 14087 alt 2 hd 24 sec 848
  /p...@0/p...@0/p...@2/s...@0/s...@0,0
   1. c1t1d0 SUN146G cyl 14087 alt 2 hd 24 sec 848
  /p...@0/p...@0/p...@2/s...@0/s...@1,0
   2. c3t5006016841E0A08Dd0 DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
  
/p...@0/p...@0/p...@8/p...@0/p...@2/SUNW,q...@0/f...@0,0/s...@w5006016841e0a08d,0
   3. c3t5006016041E0A08Dd0 DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
  
/p...@0/p...@0/p...@8/p...@0/p...@2/SUNW,q...@0/f...@0,0/s...@w5006016041e0a08d,0
   4. c3t5006016041E0A08Dd1 DGC-RAID5-0326 cyl 51198 alt 2 hd 256 sec 16
  
/p...@0/p...@0/p...@8/p...@0/p...@2/SUNW,q...@0/f...@0,0/s...@w5006016041e0a08d,1
   5. c3t5006016841E0A08Dd1 DGC-RAID5-0326 cyl 51198 alt 2 hd 256 sec 16
  
/p...@0/p...@0/p...@8/p...@0/p...@2/SUNW,q...@0/f...@0,0/s...@w5006016841e0a08d,1
   6. c5t5006016141E0A08Dd0 DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
  
/p...@0/p...@0/p...@8/p...@0/p...@a/SUNW,q...@0/f...@0,0/s...@w5006016141e0a08d,0
   7. c5t5006016941E0A08Dd0 DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
  
/p...@0/p...@0/p...@8/p...@0/p...@a/SUNW,q...@0/f...@0,0/s...@w5006016941e0a08d,0
   8. c5t5006016141E0A08Dd1 DGC-RAID5-0326 cyl 51198 alt 2 hd 256 sec 16
  
/p...@0/p...@0/p...@8/p...@0/p...@a/SUNW,q...@0/f...@0,0/s...@w5006016141e0a08d,1
   9. c5t5006016941E0A08Dd1 DGC-RAID5-0326 cyl 51198 alt 2 hd 256 sec 16
  
/p...@0/p...@0/p...@8/p...@0/p...@a/SUNW,q...@0/f...@0,0/s...@w5006016941e0a08d,1
  10. emcpower0a DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
  /pseudo/e...@0
  11. emcpower1a DGC-RAID5-0326 cyl 51198 alt 2 hd 256 sec 16
  /pseudo/e...@1
Specify disk (enter its number): Specify disk (enter its number):
r...@essapl020-u006 #
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to import zfs pool

2009-06-25 Thread Daniel J. Priem

Ketan no-re...@opensolaris.org writes:

 no idea path changed or not .. but following is output from my format .. and 
 nothing has changed 

 AVAILABLE DISK SELECTIONS:
0. c1t0d0 SUN146G cyl 14087 alt 2 hd 24 sec 848
   /p...@0/p...@0/p...@2/s...@0/s...@0,0
1. c1t1d0 SUN146G cyl 14087 alt 2 hd 24 sec 848
   /p...@0/p...@0/p...@2/s...@0/s...@1,0
2. c3t5006016841E0A08Dd0 DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
   
 /p...@0/p...@0/p...@8/p...@0/p...@2/SUNW,q...@0/f...@0,0/s...@w5006016841e0a08d,0
3. c3t5006016041E0A08Dd0 DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
   
 /p...@0/p...@0/p...@8/p...@0/p...@2/SUNW,q...@0/f...@0,0/s...@w5006016041e0a08d,0
4. c3t5006016041E0A08Dd1 DGC-RAID5-0326 cyl 51198 alt 2 hd 256 sec 16
   
 /p...@0/p...@0/p...@8/p...@0/p...@2/SUNW,q...@0/f...@0,0/s...@w5006016041e0a08d,1
5. c3t5006016841E0A08Dd1 DGC-RAID5-0326 cyl 51198 alt 2 hd 256 sec 16
   
 /p...@0/p...@0/p...@8/p...@0/p...@2/SUNW,q...@0/f...@0,0/s...@w5006016841e0a08d,1
6. c5t5006016141E0A08Dd0 DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
   
 /p...@0/p...@0/p...@8/p...@0/p...@a/SUNW,q...@0/f...@0,0/s...@w5006016141e0a08d,0
7. c5t5006016941E0A08Dd0 DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
   
 /p...@0/p...@0/p...@8/p...@0/p...@a/SUNW,q...@0/f...@0,0/s...@w5006016941e0a08d,0
8. c5t5006016141E0A08Dd1 DGC-RAID5-0326 cyl 51198 alt 2 hd 256 sec 16
   
 /p...@0/p...@0/p...@8/p...@0/p...@a/SUNW,q...@0/f...@0,0/s...@w5006016141e0a08d,1
9. c5t5006016941E0A08Dd1 DGC-RAID5-0326 cyl 51198 alt 2 hd 256 sec 16
   
 /p...@0/p...@0/p...@8/p...@0/p...@a/SUNW,q...@0/f...@0,0/s...@w5006016941e0a08d,1
   10. emcpower0a DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
   /pseudo/e...@0
   11. emcpower1a DGC-RAID5-0326 cyl 51198 alt 2 hd 256 sec 16
   /pseudo/e...@1
 Specify disk (enter its number): Specify disk (enter its number):
 r...@essapl020-u006 #


reading your first post

 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3Cconfig:

emcpool1  UNAVAIL  insufficient replicas
  emcpower0c  UNAVAIL  cannot open

one or more devices are really missing.
check your connection to the emc again

regards
daniel

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to import zfs pool

2009-06-25 Thread Ketan
thats the problem this system has just 2 LUNs assigned and both are present as 
you can see from format output 

10. emcpower0a DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
/pseudo/e...@0
11. emcpower1a DGC-RAID5-0326 cyl 51198 alt 2 hd 256 sec 16
/pseudo/e...@1
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to import zfs pool

2009-06-25 Thread Daniel J. Priem
Ketan no-re...@opensolaris.org writes:

 thats the problem this system has just 2 LUNs assigned and both are present 
 as you can see from format output 

 10. emcpower0a DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
 /pseudo/e...@0
 11. emcpower1a DGC-RAID5-0326 cyl 51198 alt 2 hd 256 sec 16
 /pseudo/e...@1

ahhh.
so the path has changed.
your old path was emcpower0c
now you have emcpower0a and emcpower1a

this config is somewhere cached. i am not sure, but IIRC you can clear
the cache an then activate the pool.
if somebode else here can jump in and point him to the right URL?
regards
daniel

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread Miles Nordin
 jl == James Lever j...@jamver.id.au writes:

jl I thought they were both closed source 

yes, both are closed source / proprietary.  If you are really confused
and not just trying to pick a dictionary fight, I can start saying
``closed source / proprietary'' on Solaris lists from now on.

On Linux lists, ``proprietary'' is clear enough, but maybe the people
around here are different.

jl and that the LSI chipset specifications were proprietary.

shrug I don't know about specifications, but I do know that Linux
has an open source driver for 1068, and Solaris has an open source
driver for 1078.

Getting source without specifications is a problem, though, yes, if
you want to track down a bug in the driver or write a driver for
another OS.

The other problem is, with both chips but especially with the 1078, it
soudns like these cards are very ``firmware'' heavy, and the firmware
is proprietary.  This causes the complaints here that 'hd' (smartctl
equivalent) doesn't work.  And that with PERC/1078 they have to make
RAID0's of each disk with LSI labels on the disk which blocks moving
the disk from one controller to another---meaning a broken controller
could potentially toast your whole zpool no matter what disk
redundancy you had, unless you figure out some way to escape the trap.
If not for the ``closed-source / proprietary'' firmware, these two
problems could never persist.

so, there is still no SATA driver for Solaris that:

 * is open-source.  

   like a fully-open stack, not just ``here look! here is some source.
   is that a rabbit over there?'' open-source meaning I can add
   smartctl or DVD writer or NCQ support without bumping into some
   strange blob that stops me.  open-source meaning I can swap out a
   disk without having to run any proprietary code to ``bless'' the
   disk first.  no BIOS bluescreen garbage either.

 * supports NCQ and hotplug

 * performs well and doesn't have a lot of bugs, like ``freezes'' and
   so on

 * works on x86 and SPARC

 * comes in card form so it can achieve high port density

on Linux, both Marvell and LSI 1068 driver come close to or meet all
these.  (smartctl DOES work with Linux's open source 1068 driver.)

Sun has more leverage with LSI than Linux not less because they are an
actual customer of LSI's chips for the hardware they sell---even
ditched Marvell for LSI!---yet they do worse on driver openness
negotiation and then try to blame LSI's whim, and tell random scmuck
user to ``go complain to LSI'' when we are not LSI's customer, Sun is.

The issue gets more complicated, but not better, IMHO.


pgpQpHDvDu5iT.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-25 Thread Miles Nordin
 sm == Scott Meilicke no-re...@opensolaris.org writes:

sm Some storage will flush their caches despite the fact that the
sm NVRAM protection makes those caches as good as stable
sm storage. [...]  ZFS also issues a flush every time an
sm application requests a synchronous write (O_DSYNC, fsync, NFS
sm commit, and so on). [...] this neutralizes the benefits of
sm having an NVRAM-based storage.

if the external RAID array or the solaris driver is broken, yes.  If
not broken, the NVRAM should provide an extra-significant speed boost
for exactly the case of frequent synchronous writes.  Isn't that
section of the evil tuning guide you're quoting actually about
checking if the NVRAM/driver connection is working right or not?

sm When I was testing iSCSI vs. NFS, it was clear iSCSI was not
sm doing sync, NFS was.

I wonder if this is a bug in iSCSI, in either the VMWare initiator or
the Sun target.  With VM's there shouldn't be any opening and closing
of files to provoke an extra sync on NFS, only read, write, and sync
to the middle of big files, so I wouldn't think NFS should do any more
or less syncing than iSCSI.


pgpGGlNZGsbbP.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to import zfs pool

2009-06-25 Thread Ketan
zpool cache is in /etc/zfs/zpool.cache or it can be viewed as zdb -C

but in my case its blank :-(
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to import zfs pool

2009-06-25 Thread Ketan
and regarding the path my other system has same and its working fine  see the 
below output 

 # zpool status
  pool: emcpool1
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
emcpool1  ONLINE   0 0 0
  emcpower0c  ONLINE   0 0 0

errors: No known data errors
  10. emcpower0a DGC-RAID5-0326 cyl 65533 alt 2 hd 16 sec 890
  /pseudo/e...@0
  11. emcpower1a DGC-RAID 5-0326-300.00GB
  /pseudo/e...@1
Specify disk (enter its number): Specify disk (enter its number):
r...@essapl020-u008 #
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread Simon Breden
The situation regarding lack of open source drivers for these LSI 
1068/1078-based cards is quite scary.

And did I understand you correctly when you say that these LSI 1068/1078 
drivers write labels to drives, meaning you can't move drives from an LSI 
controlled array to another arbitrary array due to these labels?

If this is the case then surely my best bet would be to go for the non-LSI 
controllers -- e.g. the AOC-SAT2-MV8 instead, which I presume does not write 
labels to the array drives?

Please correct me if I have misunderstood.

Cheers,
Simon
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] auto snapshots 0.12

2009-06-25 Thread Richard Elling

Thanks Tim!
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread Miles Nordin
 sb == Simon Breden no-re...@opensolaris.org writes:

sb The situation regarding lack of open source drivers for these
sb LSI 1068/1078-based cards is quite scary.

meh I dunno.  The amount of confusion is a little scary, I guess.

sb And did I understand you correctly when you say that these LSI
sb 1068/1078 drivers write labels to drives,

no incorrect.  I'm using a 1068 (``closed-source / proprietary
driver''), and it doesn't write such labels.

The firmware piece is big, so not all 1068 are necessarily the same: 
I think some are capable of RAID0/RAID1.  but so far I've not heard of
a 1068 demanding LSI labels, and mine doesn't.

The LSI 1078 (PERC) with the open-source x86-only driver is the one
with the big ``closed-source / proprietary'' firmware blob running on
the card itself.  Others have reported this blob demands LSI labels on
the disks.  I don't have one.  

who knows, maybe you can cross-flash some weird firmware from some
strange variant of card that doesn't need LSI labels on each disk, or
maybe some binary blob config tool will flip a magic undocumented
switch inside the card to make it JBOD-able.  I don't like to deal in
such circus-hoop messes unless someone else can do the work and tell
me exactly how.

sb go for the non-LSI controllers -- e.g. the AOC-SAT2-MV8

no, you misunderstood because there are two kinds of LSI card with two
different drivers.

compared to Marvell, LSI 1068 has a cheaper bus (PCIe), performs
better, and seems to have fewer bugs (ex. 6787312 is duplicate of a
secret Marvell bug), and its proprietary driver includes a SPARC
object.  The Marvell controller is still ``closed-source /
proprietary'' driver (Linux driver for the same chip: open source), so
you gain nothing there.  The one thing Marvell might gain you is, it's
SATA framework, so smartctl/hd may be closer to working.  On Linux
both cards use their uniform SCSI framework so smartctl works.

I have both AOC-SAT2-MV8 and AOC-USAS-L8i and suggest the latter.  You
have to unscrew teh reverse-polarity card-edge bracket and buy some
octopus cables from thenerds.net or adaptec or similar, is all.
AOC-USAS-L8i works with these cables among others:

 
http://www.thenerds.net/3WARE.AMCC_Serial_Attached_SCSI_SAS_Internal_Cable.CBLSFF8087OCF10M.html


pgpaAcdBpkIe7.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best controller card for 8 SATA drives ?

2009-06-25 Thread Nicholas Lee
On Fri, Jun 26, 2009 at 4:11 AM, Eric D. Mudama
edmud...@bounceswoosh.orgwrote:

 True.  In $ per sequential GB/s, rotating rust still wins by far.
 However, your comment about all flash being slower than rotating at
 sequential writes was mistaken.  Even at 10x the price, if you're
 working with a dataset that needs random IO, the $ per IOP from flash
 can be significantly greater than any amount of rust, and typically
 with much lower power consumption to boot.

 Obviously the primary benefits of SSDs aren't in sequential
 reads/writes, but they're not necessarilly complete dogs there either.


It's all about iops.  HDD can do about 300 iops, SSD can get up to 10k+
iops.  On sequential writes obviously low iops is not a problem - 300 x
128kB is 40MB. But for small packet random sync NFS traffic 300 * 32kb is
hardly a 1MB/s.

Nicholas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread Richard Elling

Miles Nordin wrote:

sb == Simon Breden no-re...@opensolaris.org writes:



sb The situation regarding lack of open source drivers for these
sb LSI 1068/1078-based cards is quite scary.

meh I dunno.  The amount of confusion is a little scary, I guess.

sb And did I understand you correctly when you say that these LSI
sb 1068/1078 drivers write labels to drives,

no incorrect.  I'm using a 1068 (``closed-source / proprietary
driver''), and it doesn't write such labels.
  


I think the confusion is because the 1068 can do hardware RAID, it
can and does write its own labels, as well as reserve space for replacements
of disks with slightly different sizes.  But that is only one mode of 
operation.


Nit: the definition of proprietary is relating to ownership.  One could
argue that Linus still owns Linux since he has such strong control over
what is accepted in the Linux kernel :-)  Similarly, one could argue that a
forker would own the fork. In other words, open source and proprietary
are not mutually exclusive, nor is closed source a synonym for
proprietary. You say tomato, I say 'mater.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to import zfs pool

2009-06-25 Thread Ketan
Thanx to all for the efforts but i was able to import the zpool after disabling 
first HBA cards do not know the reason for this but now the pool is imported 
and there was not disk lost :-)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread Simon Breden
Miles, thanks for helping clear up the confusion surrounding this subject!

My decision is now as above: for my existing NAS to leave the pool as-is, and 
seek a 2+ SATA port card for the 2-drive mirror for 2 x 30GB SATA boot SSDs 
that I want to add.

For the next NAS build later on this summer, I will go for an LSI 1068-based 
SAS/SATA configuration based on a PCIe expansion slot, rather than the ageing 
PCI-X slots.

Using PCIe instead of PCI-X also opens up a load more possible motherboards, 
although as I want ECC support this still limits choices for mobos. I was 
thinking of using something like a Xeon E5504 (Nehalem) in the new NAS, and 
I've been hunting for a good, highly compatible mobo that will give the least 
aggro (trouble) with OpenSolaris, and this one looks good as it's pretty much 
totally Intel chipsets, and it has an LSI SAS1068E, which I trust should be 
supported by Solaris, and it also has additional PCIe slots for additional 
future expansion, and basic onboard graphics chip, and dual Intel GbE NICs:
SuperMicro X8STi-3F: 
http://www.supermicro.com/products/motherboard/Xeon3000/X58/X8STi-3F.cfm

Any comments on this mobo welcome, plus suggestions for a possible PCIe-based 
2+ port SATA card that is reliable and has a solid driver.

Simon
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread Simon Breden
 I think the confusion is because the 1068 can do hardware RAID, it
can and does write its own labels, as well as reserve space for replacements
of disks with slightly different sizes. But that is only one mode of 
operation.

So, it sounds like if I use a 1068-based device, and I *don't* want it to write 
labels to the drives to allow easy portability of drives to a different 
controller, then I need to avoid the RAID mode of the device and instead 
force it to use JBOD mode. Is this easily selectable? I guess you just avoid 
the Use RAID mode option in the controller's BIOS or something?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread James C. McPherson
On Thu, 25 Jun 2009 15:43:17 -0700 (PDT)
Simon Breden no-re...@opensolaris.org wrote:

  I think the confusion is because the 1068 can do hardware RAID, it
 can and does write its own labels, as well as reserve space for replacements
 of disks with slightly different sizes. But that is only one mode of 
 operation.
 
 So, it sounds like if I use a 1068-based device, and I *don't* want it to 
 write labels to the drives to allow easy portability of drives to a different 
 controller, then I need to avoid the RAID mode of the device and instead 
 force it to use JBOD mode. Is this easily selectable? I guess you just avoid 
 the Use RAID mode option in the controller's BIOS or something?

It's even simpler than that with the 1068 - just don't use raidctl
or the bios to create raid volumes and you'll have a bunch of plain
disks. No forcing required.

James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
Kernel Conference Australia - http://au.sun.com/sunnews/events/2009/kernel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread Eric D. Mudama

On Fri, Jun 26 at  8:55, James C. McPherson wrote:

On Thu, 25 Jun 2009 15:43:17 -0700 (PDT)
Simon Breden no-re...@opensolaris.org wrote:


 I think the confusion is because the 1068 can do hardware RAID,
 it can and does write its own labels, as well as reserve space
 for replacements of disks with slightly different sizes. But
 that is only one mode of operation.

So, it sounds like if I use a 1068-based device, and I *don't* want
it to write labels to the drives to allow easy portability of
drives to a different controller, then I need to avoid the RAID
mode of the device and instead force it to use JBOD mode. Is this
easily selectable? I guess you just avoid the Use RAID mode
option in the controller's BIOS or something?


It's even simpler than that with the 1068 - just don't use raidctl
or the bios to create raid volumes and you'll have a bunch of plain
disks. No forcing required.


Exactly.  Worked as such out-of-the-box with no forcing of any kind
for me.

--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-25 Thread Scott Meilicke
 Isn't that section of the evil tuning guide you're quoting actually about
 checking if the NVRAM/driver connection is working right or not?

Miles, yes, you are correct. I just thought it was interesting reading about 
how syncs and such work within ZFS.

Regarding my NFS test, you remind me that my test was flawed, in that my iSCSI 
numbers were using the ESXi iSCSI SW initiator, while the NFS tests were 
performed with the VM as the guest, not ESX. I'll give ESX as the NFS client, 
vmdks on NFS, a go and get back to you. Thanks!

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread Simon Breden
That sounds even better :)

So what's the procedure to create a zpool using the 1068?

Also, any special 'tricks /tips' / commands required for using a 1068-based 
SAS/SATA device?

Simon
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread James C. McPherson
On Thu, 25 Jun 2009 16:11:04 -0700 (PDT)
Simon Breden no-re...@opensolaris.org wrote:

 That sounds even better :)
 
 So what's the procedure to create a zpool using the 1068?

same as any other device:

# zpool create poolname vdev vdev vdev

 
 Also, any special 'tricks /tips' / commands required for using a 1068-based 
 SAS/SATA device?

no


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
Kernel Conference Australia - http://au.sun.com/sunnews/events/2009/kernel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread Simon Breden
OK, thanks James.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread Erik Trimble

Simon Breden wrote:

I think the confusion is because the 1068 can do hardware RAID, it


can and does write its own labels, as well as reserve space for replacements
of disks with slightly different sizes. But that is only one mode of 
operation.


So, it sounds like if I use a 1068-based device, and I *don't* want it to write labels to the 
drives to allow easy portability of drives to a different controller, then I need to avoid the 
RAID mode of the device and instead force it to use JBOD mode. Is this easily 
selectable? I guess you just avoid the Use RAID mode option in the controller's BIOS or 
something?
  
In the Sun onboard version of the 1068, the JBOD mode is the default.  
I don't know about the add-in cards, but I suspect it's the same.  Worst 
case, you push Cntr-L (or whatever it prompts you for) at the BIOS 
initialization, and remove any RAID device it's configured.  With no 
RAID devices configured, it runs as a pure HBA (i.e. in JBOD mode).


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-25 Thread Erik Trimble

Simon Breden wrote:

Miles, thanks for helping clear up the confusion surrounding this subject!

My decision is now as above: for my existing NAS to leave the pool as-is, and 
seek a 2+ SATA port card for the 2-drive mirror for 2 x 30GB SATA boot SSDs 
that I want to add.

For the next NAS build later on this summer, I will go for an LSI 1068-based 
SAS/SATA configuration based on a PCIe expansion slot, rather than the ageing 
PCI-X slots.

Using PCIe instead of PCI-X also opens up a load more possible motherboards, 
although as I want ECC support this still limits choices for mobos. I was 
thinking of using something like a Xeon E5504 (Nehalem) in the new NAS, and 
I've been hunting for a good, highly compatible mobo that will give the least 
aggro (trouble) with OpenSolaris, and this one looks good as it's pretty much 
totally Intel chipsets, and it has an LSI SAS1068E, which I trust should be 
supported by Solaris, and it also has additional PCIe slots for additional 
future expansion, and basic onboard graphics chip, and dual Intel GbE NICs:
SuperMicro X8STi-3F: 
http://www.supermicro.com/products/motherboard/Xeon3000/X58/X8STi-3F.cfm

Any comments on this mobo welcome, plus suggestions for a possible PCIe-based 
2+ port SATA card that is reliable and has a solid driver.

Simon
  
Note that the X8STi-3F  requires an L-bracket riser card to use both the 
PCI-E x16 and the x8 slot, which will be mounted horizontally (and, 
likely, limited to low-profile cards).  You'd likely have to use a 
custom Supermicro case for this to work.  Otherwise, you're limited to 
the PCI-E x16 slot, in a standard vertical orientation.  The board does 
have an IPMI-based KVM ethernet port, but I have no idea if it's 
supported under Solaris.


Also, remember, that you'll have to order a Xeon CPU with this, NOT the 
i7 CPU, in order to get ECC memory support.





Personally, I'd go for an AMD-based system, which is about the same 
cost, and a much better board:


http://www.supermicro.com/Aplus/motherboard/Opteron2000/MCP55/H8DM3-2.cfm

(comes with a 1068E SAS controller, AND the nVidia MCP55-based 6-port 
SATA controller, no need for any more PCI-cards, and it supports the 
add-in card for remote KVM console;  it's a dual-socket, Extended ATX 
size, though).



The MCP55 is the chipset currently in use in the Sun X2200 M2 series of 
servers.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to convert zio-io_offset to disk block number?

2009-06-25 Thread zhihui Chen
Find that zio-io_offset is the absolute offset of device, not in sector
unit. And If we need use zdb -R to dump the block, we should use the offset
(zio-io_offset-0x40).
2009/6/25 zhihui Chen zhch...@gmail.com

 I use following dtrace script to trace the postion of one file on zfs:

 #!/usr/sbin/dtrace -qs
 zio_done:entry
 /((zio_t *)(arg0))-io_vd/
 {
 zio=(zio_t *)arg0;
 printf(Offset:%x and Size:%x\n,zio-io_offset,zio-io_size);
 printf(vd:%x\n,(unsigned long)(zio-io_vd));
 printf(process name:%s\n,execname);
 tracemem(zio-io_data,40);
 stack();
 }

 and I run dd command:  dd if=/export/dsk1/test1 bs=512 count=1, the
 dtrace script will generate following it output:

 Offset:657800 and Size:200
 vd:ff02d6a1a700
 process name:sched
   
   zfs`zio_execute+0xa0
   genunix`taskq_thread+0x193
   unix`thread_start+0x8
 ^C

 The tracemem output is the right context of file test1, which is a 512-byte
 text file. zpool status has following output:

 pool: tpool
 state: ONLINE
 scrub: none requested
 config:
 NAMESTATE READ WRITE CKSUM
 tpool   ONLINE   0 0 0
   c2t0d0ONLINE   0 0 0
 errors: No known data errors

 My question is how to translate zio-io_offset (0x657800, equal to decimal
 number 6649856) outputed by dtace to block number on disk c2t0d0?
 I tried to use dd if=/dev/dsk/c2t0d0 of=text iseek=6650112 bs=512 count=1
 for a check,but the result is not right.

 Thanks
 Zhihui


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss