Re: [zfs-discuss] Removing Cloned Snapshot

2010-02-12 Thread Brandon High
On Fri, Feb 12, 2010 at 1:08 PM, Daniel Carosone  wrote:
> With dedup and bp-rewrite, a new operation could be created that takes
> the shared data and makes it uniquely-referenced but deduplicated data.
> This could be a lot more efficient and less disruptive because of the
> advanced knnowledge that the data must already be the same.

That's essentially what a send/recv does when dedup is enabled.

-B

-- 
Brandon High : bh...@freaks.com
There is absolutely no substitute for a genuine lack of preparation.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs promote

2010-02-12 Thread tester
Hello,

# /usr/sbin/zfs list -r rgd3
NAME   USEDAVAIL  REFER  MOUNTPOINT
rgd3   16.5G23.4G20K
/rgd3
rgd3/fs1  19K   23.4G21K   
/app/fs1
rgd3/fs1-patch 16.4G23.4G   16.4G 
/app/fs1-patch
rgd3/fs1-pa...@snap134.8M -   16.4G  -

# /usr/sbin/zfs promote rgd3/fs1

snap is 16.4G in USED.

# /usr/sbin/zfs list -r rgd3
NAME   USED  AVAIL  REFER  MOUNTPOINT
rgd3  16.5G  23.4G20K   /rgd3
rgd3/fs1 16.4G  23.4G21K  /app/fs1
rgd3/f...@snap1 16.4G   -  16.4G -
rgd3/fs1-patch33.9M  23.4G  16.4G/app/fs1-patch

5.10 Generic_141414-10

I tired to line up numbers, it does not work. Sorry for the format.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-12 Thread Brendan Gregg - Sun Microsystems
On Fri, Feb 12, 2010 at 02:25:51PM -0800, TMB wrote:
> I have a similar question, I put together a cheapo RAID with four 1TB WD 
> Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with 
> slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
> # zpool status dpool
>   pool: dpool
>  state: ONLINE
>  scrub: none requested
> config:
> 
> NAMESTATE READ WRITE CKSUM
> dpool   ONLINE   0 0 0
>   raidz1ONLINE   0 0 0
> c0t0d0  ONLINE   0 0 0
> c0t0d1  ONLINE   0 0 0
> c0t0d2  ONLINE   0 0 0
> c0t0d3  ONLINE   0 0 0
> [b]logs
>   c0t0d4s0  ONLINE   0 0 0[/b]
> [b]cache
>   c0t0d4s1  ONLINE   0 0 0[/b]
> spares
>   c0t0d6AVAIL   
>   c0t0d7AVAIL   
> 
>capacity operationsbandwidth
> pool used  avail   read  write   read  write
> --  -  -  -  -  -  -
> dpool   72.1G  3.55T237 12  29.7M   597K
>   raidz172.1G  3.55T237  9  29.7M   469K
> c0t0d0  -  -166  3  7.39M   157K
> c0t0d1  -  -166  3  7.44M   157K
> c0t0d2  -  -166  3  7.39M   157K
> c0t0d3  -  -167  3  7.45M   157K
>   c0t0d4s020K  4.97G  0  3  0   127K
> cache   -  -  -  -  -  -
>   c0t0d4s1  17.6G  36.4G  3  1   249K   119K
> --  -  -  -  -  -  -
> I just don't seem to be getting any bang for the buck I should be.  This was 
> taken while rebuilding an Oracle index, all files stored in this pool.  The 
> WD disks are at 100%, and nothing is coming from the cache.  The cache does 
> have the entire DB cached (17.6G used), but hardly reads anything from it.  I 
> also am not seeing the spike of data flowing into the ZIL either, although 
> iostat show there is just write traffic hitting the SSD:
> 
>  extended device statistics  cpu
> devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
> sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
> sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100 
> sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100 
> sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0 
> sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100 
> [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]
> 
> Since this SSD is in a RAID array, and just presents as a regular disk LUN, 
> is there a special incantation required to turn on the Turbo mode?
> 
> Doesnt it seem that all this traffic should be maxing out the SSD? Reads from 
> the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to 
> add as a mirror, but it seems pointless if there's no zip to be had

The most likely reason is that this workload has been identified as streaming
by ZFS, which is prefetching from disk instead of the L2ARC (l2arc_nopreftch=1).

It also looks like you've used a 128 Kbyte ZFS record size.  Is Oracle doing
128 Kbyte random I/O?  We usually tune that down before creating the database;
which will use the L2ARC device more efficiently.

Brendan

-- 
Brendan Gregg, Fishworks   http://blogs.sun.com/brendan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Brendan Gregg - Sun Microsystems
G'Day,

On Sat, Feb 13, 2010 at 09:02:58AM +1100, Daniel Carosone wrote:
> On Fri, Feb 12, 2010 at 11:26:33AM -0800, Richard Elling wrote:
> > Mathing aorund a bit, for a 300 GB L2ARC (apologies for the tab separation):
> > size (GB)   300 
> > size (sectors)  585937500   
> > labels (sectors)9232
> > available sectors   585928268   
> > bytes/L2ARC header  200 
> > 
> > recordsize (sectors)recordsize (kBytes) L2ARC capacity 
> > (records)Header size (MBytes)
> > 1   0.5 585928268   111,760
> > 2   1   292964134   55,880
> > 4   2   146482067   27,940
> > 8   4   7324103313,970
> > 16  8   366205166,980
> > 32  16  183102583,490
> > 64  32  9155129 1,750
> > 128 64  4577564 870
> > 256 128 2288782 440
> > 
> > So, depending on the data, you need somewhere between 440 MBytes and  111 
> > GBytes
> > to hold the L2ARC headers. For a rule of thumb, somewhere between 0.15% and 
> > 40%
> > of the total used size. Ok, that rule really isn't very useful...
> 
> All that precision up-front for such a broad conclusion..  bummer :)
> 
> I'm interested in a better rule of thumb, for rough planning
> purposes.  As previously noted, I'm especially interesed in the

I use 2.5% for an 8 Kbyte record size.  ie, for every 1 Gbyte of L2ARC, about
25 Mbytes of ARC is consumed.  I don't recommand other record sizes since:

- the L2ARC is currently intended for random I/O workloads.  Such workloads
  usually have small record sizes, such as 8 Kbytes.  Larger record sizes (such
  as the 128 Kbyte default) is better for streaming workloads.  The L2ARC
  doesn't currently touch streaming workloads (l2arc_noprefetch=1).

- The best performance from SSDs is with smaller I/O sizes, not larger.  I get
  about 3200 x 8 Kbyte read I/O from my current L2ARC devices, yet only about
  750 x 128 Kbyte read I/O from the same devices.

- smaller than 4 Kbyte record sizes leads to a lot of ARC headers and worse
  streaming performance.  I wouldn't tune it smaller unless I had to for
  some reason.

So, from the table above I'd only really consider the 4 to 32 Kbyte size range.
4 Kbytes if you really wanted a smaller record size, and 32 Kbytes if you had
limited DRAM you wanted to conserve (at the trade-off of SSD performance.)

Brendan


> combination with dedup, where DDT entries need to be cached.  What's
> the recordsize for L2ARC-of-on-disk-DDT, and how does that bias the
> overhead %age above?
> 
> I'm also interested in a more precise answer to a different question,
> later on.  Lets say I already have an L2ARC, running and warm.  How do
> I tell how much is being used?  Presumably, if it's not full, RAM 
> to manage it is the constraint - how can I confirm that and how can I
> tell how much RAM is currently used?
> 
> If I can observe these figures, I can tell if I'm wasting ssd space
> that can't be used.  Either I can reallocate that space or know that
> adding RAM will have an even bigger benefit (increasing both primary
> and secondary cache sizes).  Maybe I can even decide that L2ARC is not
> worth it for this box (especially if it can't fit any more RAM).
> 
> Finally, how smart is L2ARC at optimising this usage? If it's under
> memory pressure, does it prefer to throw out smaller records in favour
> of larger more efficient ones? 
> 
> My current rule of thumb for all this, absent better information, is
> that you should just have gobs of RAM (no surprise there) but that if
> you can't, then dedup seems to be most worthwhile when the pool itself
> is on ssd, no l2arc. Say, a laptop.  Here, you care most about saving
> space and the IO overhead costs least.
> 
> We need some thumbs in between these extremes.  :-(
> 
> --
> Dan.


> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
Brendan Gregg, Fishworks   http://blogs.sun.com/brendan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-12 Thread TMB
I have a similar question, I put together a cheapo RAID with four 1TB WD Black 
(7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with slice 0 
(5GB) for ZIL and the rest of the SSD  for cache:
# zpool status dpool
  pool: dpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
dpool   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c0t0d1  ONLINE   0 0 0
c0t0d2  ONLINE   0 0 0
c0t0d3  ONLINE   0 0 0
[b]logs
  c0t0d4s0  ONLINE   0 0 0[/b]
[b]cache
  c0t0d4s1  ONLINE   0 0 0[/b]
spares
  c0t0d6AVAIL   
  c0t0d7AVAIL   

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
dpool   72.1G  3.55T237 12  29.7M   597K
  raidz172.1G  3.55T237  9  29.7M   469K
c0t0d0  -  -166  3  7.39M   157K
c0t0d1  -  -166  3  7.44M   157K
c0t0d2  -  -166  3  7.39M   157K
c0t0d3  -  -167  3  7.45M   157K
  c0t0d4s020K  4.97G  0  3  0   127K
cache   -  -  -  -  -  -
  c0t0d4s1  17.6G  36.4G  3  1   249K   119K
--  -  -  -  -  -  -
I just don't seem to be getting any bang for the buck I should be.  This was 
taken while rebuilding an Oracle index, all files stored in this pool.  The WD 
disks are at 100%, and nothing is coming from the cache.  The cache does have 
the entire DB cached (17.6G used), but hardly reads anything from it.  I also 
am not seeing the spike of data flowing into the ZIL either, although iostat 
show there is just write traffic hitting the SSD:

 extended device statistics  cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100 
sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100 
sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0 
sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100 
[b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]

Since this SSD is in a RAID array, and just presents as a regular disk LUN, is 
there a special incantation required to turn on the Turbo mode?

Doesnt it seem that all this traffic should be maxing out the SSD? Reads from 
the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to add 
as a mirror, but it seems pointless if there's no zip to be had

help?

Thanks,
Tracey
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Daniel Carosone
On Fri, Feb 12, 2010 at 11:26:33AM -0800, Richard Elling wrote:
> Mathing aorund a bit, for a 300 GB L2ARC (apologies for the tab separation):
>   size (GB)   300 
>   size (sectors)  585937500   
>   labels (sectors)9232
>   available sectors   585928268   
>   bytes/L2ARC header  200 
>   
>   recordsize (sectors)recordsize (kBytes) L2ARC capacity 
> (records)Header size (MBytes)
>   1   0.5 585928268   111,760
>   2   1   292964134   55,880
>   4   2   146482067   27,940
>   8   4   7324103313,970
>   16  8   366205166,980
>   32  16  183102583,490
>   64  32  9155129 1,750
>   128 64  4577564 870
>   256 128 2288782 440
> 
> So, depending on the data, you need somewhere between 440 MBytes and  111 
> GBytes
> to hold the L2ARC headers. For a rule of thumb, somewhere between 0.15% and 
> 40%
> of the total used size. Ok, that rule really isn't very useful...

All that precision up-front for such a broad conclusion..  bummer :)

I'm interested in a better rule of thumb, for rough planning
purposes.  As previously noted, I'm especially interesed in the
combination with dedup, where DDT entries need to be cached.  What's
the recordsize for L2ARC-of-on-disk-DDT, and how does that bias the
overhead %age above?

I'm also interested in a more precise answer to a different question,
later on.  Lets say I already have an L2ARC, running and warm.  How do
I tell how much is being used?  Presumably, if it's not full, RAM 
to manage it is the constraint - how can I confirm that and how can I
tell how much RAM is currently used?

If I can observe these figures, I can tell if I'm wasting ssd space
that can't be used.  Either I can reallocate that space or know that
adding RAM will have an even bigger benefit (increasing both primary
and secondary cache sizes).  Maybe I can even decide that L2ARC is not
worth it for this box (especially if it can't fit any more RAM).

Finally, how smart is L2ARC at optimising this usage? If it's under
memory pressure, does it prefer to throw out smaller records in favour
of larger more efficient ones? 

My current rule of thumb for all this, absent better information, is
that you should just have gobs of RAM (no surprise there) but that if
you can't, then dedup seems to be most worthwhile when the pool itself
is on ssd, no l2arc. Say, a laptop.  Here, you care most about saving
space and the IO overhead costs least.

We need some thumbs in between these extremes.  :-(

--
Dan.

pgpgmOukpOsXT.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Removing Cloned Snapshot

2010-02-12 Thread Daniel Carosone
On Fri, Feb 12, 2010 at 09:50:32AM -0500, Mark J Musante wrote:
> The other option is to zfs send the snapshot to create a copy  
> instead of a clone. 

One day, in the future, I hope there might be a third option, somewhat
as an optimimsation.  

With dedup and bp-rewrite, a new operation could be created that takes
the shared data and makes it uniquely-referenced but deduplicated data.  
This could be a lot more efficient and less disruptive because of the
advanced knnowledge that the data must already be the same.

Whether its worth the implementation effort is another issue, but in
the meantime we have plenty of time to try and come up with a sensible
name for it.  "unclone" is too boring :)

--
Dan.


pgpZRhozoZ5Cf.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced

2010-02-12 Thread Daniel Bakken
On Fri, Feb 12, 2010 at 12:11 PM, Al Hopper  wrote:
> There's your first mistake.  You're probably eligible for a very nice
> Federal Systems discount.  My *guess* would be about 40%.

Promise JBOD and similar systems are often the only affordable choice
for those of us who can't get sweetheart discounts, don't work at
billion dollar corporations, or aren't bankrolled by the Federal
Leviathan.

Daniel Bakken
Systems Administrator
Economic Modeling Specialists Inc
1187 Alturas Drive
Moscow, Idaho 83843
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-12 Thread Scott Meilicke
I don't think adding an SSD mirror to an existing pool will do much for 
performance. Some of your data will surely go to those SSDs, but I don't think 
the solaris will know they are SSDs and move blocks in and out according to 
usage patterns to give you an all around boost. They will just be used to store 
data, nothing more.

Perhaps it will be more useful to add the SSDs as either an L2ARC or SLOG for 
the ZIL, but that will depend upon your work load. If you do NFS or iSCSI 
access, the putting the ZIL onto the SSD drive(s) will speed up writes. Added 
to the L2ARC will speed up reads.

Here is the ZFS best practices guide, which should help with this decision:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

Read that, then come back with more questions.

Best,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced

2010-02-12 Thread Al Hopper
On Tue, Feb 9, 2010 at 12:55 PM, matthew patton  wrote:

. snip 
> Enter the J4500, 48 drives in 4U, what looks to be solid engineering, and 
> redundancy in all the right places. An empty chassis at $3000 is totally 
> justifiable. Maybe as high as $4000. In comparison a naked Dell MD1000 is 
> $2000. If you do the subtraction from SUN's claimed "breakthru" pricing of 
> $1/GB, the chassis cost works out to $4000. I can live with that.
>
> Now look up the price for 24TB and it's 28 freaking thousand!

There's your first mistake.  You're probably eligible for a very nice
Federal Systems discount.  My *guess* would be about 40%.

. snip ...

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX a...@logical-approach.com
   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Richard Elling
On Feb 12, 2010, at 9:36 AM, Felix Buenemann wrote:

> Am 12.02.10 18:17, schrieb Richard Elling:
>> On Feb 12, 2010, at 8:20 AM, Felix Buenemann wrote:
>> 
>>> Hi Mickaël,
>>> 
>>> Am 12.02.10 13:49, schrieb Mickaël Maillot:
 Intel X-25 M are MLC not SLC, there are very good for L2ARC.
>>> 
>>> Yes, I'm only using those for L2ARC, I'm planing on getting to Mtron Pro 
>>> 7500 16GB SLC SSDs for ZIL.
>>> 
 and next, you need more RAM:
 ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS
 use memory to allocate and manage L2ARC.
>>> 
>>> Is there a guideline in which relation L2ARC size should be to RAM?
>> 
>> Approximately 200 bytes per record. I use the following example:
>>  Suppose we use a Seagate LP 2 TByte disk for the L2ARC
>>  + Disk has 3,907,029,168 512 byte sectors, guaranteed
>>  + Workload uses 8 kByte fixed record size
>>  RAM needed for arc_buf_hdr entries
>>  + Need = ~(3,907,029,168 - 9,232) * 200 / 16 = ~48 GBytes
>> 
>> Don't underestimate the RAM needed for large L2ARCs
> 
> I'm not sure how your workload record size plays into above formula (where 
> does - 9232 come from?), but given I've got ~300GB L2ARC, I'd need about 
> 7.2GB RAM, so upgrading to 8GB would be enough to satisfy the L2ARC.

recordsize=8kB=16 sectors @ 512 bytes/sector

9,232 is the number of sectors reserved for labels, around 4.75 MBytes

Mathing aorund a bit, for a 300 GB L2ARC (apologies for the tab separation):
size (GB)   300 
size (sectors)  585937500   
labels (sectors)9232
available sectors   585928268   
bytes/L2ARC header  200 

recordsize (sectors)recordsize (kBytes) L2ARC capacity 
(records)Header size (MBytes)
1   0.5 585928268   111,760
2   1   292964134   55,880
4   2   146482067   27,940
8   4   7324103313,970
16  8   366205166,980
32  16  183102583,490
64  32  9155129 1,750
128 64  4577564 870
256 128 2288782 440

So, depending on the data, you need somewhere between 440 MBytes and  111 GBytes
to hold the L2ARC headers. For a rule of thumb, somewhere between 0.15% and 40%
of the total used size. Ok, that rule really isn't very useful...

The next question is, what does my data look like?  The answer is that there 
will
most likely be a distribution of various sized record. But the distribution 
isn't as 
interesting for this calculation  than the actual number of records. I'm not 
sure
there is an easy way to get that information, but I'll look around...
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Bill Sommerfeld

On 02/12/10 09:36, Felix Buenemann wrote:

given I've got ~300GB L2ARC, I'd
need about 7.2GB RAM, so upgrading to 8GB would be enough to satisfy the
L2ARC.


But that would only leave ~800MB free for everything else the server 
needs to do.


- Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Felix Buenemann

Am 12.02.10 18:17, schrieb Richard Elling:

On Feb 12, 2010, at 8:20 AM, Felix Buenemann wrote:


Hi Mickaël,

Am 12.02.10 13:49, schrieb Mickaël Maillot:

Intel X-25 M are MLC not SLC, there are very good for L2ARC.


Yes, I'm only using those for L2ARC, I'm planing on getting to Mtron Pro 7500 
16GB SLC SSDs for ZIL.


and next, you need more RAM:
ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS
use memory to allocate and manage L2ARC.


Is there a guideline in which relation L2ARC size should be to RAM?


Approximately 200 bytes per record. I use the following example:
Suppose we use a Seagate LP 2 TByte disk for the L2ARC
+ Disk has 3,907,029,168 512 byte sectors, guaranteed
+ Workload uses 8 kByte fixed record size
RAM needed for arc_buf_hdr entries
+ Need = ~(3,907,029,168 - 9,232) * 200 / 16 = ~48 GBytes

Don't underestimate the RAM needed for large L2ARCs


I'm not sure how your workload record size plays into above formula 
(where does - 9232 come from?), but given I've got ~300GB L2ARC, I'd 
need about 7.2GB RAM, so upgrading to 8GB would be enough to satisfy the 
L2ARC.



  -- richard



I could upgrade the server to 8GB, but that's the maximum the i975X chipset can 
handle.

Best Regards,
Felix Buenemann



- Felix



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Richard Elling
On Feb 12, 2010, at 8:20 AM, Felix Buenemann wrote:

> Hi Mickaël,
> 
> Am 12.02.10 13:49, schrieb Mickaël Maillot:
>> Intel X-25 M are MLC not SLC, there are very good for L2ARC.
> 
> Yes, I'm only using those for L2ARC, I'm planing on getting to Mtron Pro 7500 
> 16GB SLC SSDs for ZIL.
> 
>> and next, you need more RAM:
>> ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS
>> use memory to allocate and manage L2ARC.
> 
> Is there a guideline in which relation L2ARC size should be to RAM?

Approximately 200 bytes per record. I use the following example:
Suppose we use a Seagate LP 2 TByte disk for the L2ARC
+ Disk has 3,907,029,168 512 byte sectors, guaranteed
+ Workload uses 8 kByte fixed record size
RAM needed for arc_buf_hdr entries
+ Need = ~(3,907,029,168 - 9,232) * 200 / 16 = ~48 GBytes

Don't underestimate the RAM needed for large L2ARCs
 -- richard

> 
> I could upgrade the server to 8GB, but that's the maximum the i975X chipset 
> can handle.
> 
> Best Regards,
>Felix Buenemann
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Felix Buenemann

Hi Mickaël,

Am 12.02.10 13:49, schrieb Mickaël Maillot:

Intel X-25 M are MLC not SLC, there are very good for L2ARC.


Yes, I'm only using those for L2ARC, I'm planing on getting to Mtron Pro 
7500 16GB SLC SSDs for ZIL.



and next, you need more RAM:
ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS
use memory to allocate and manage L2ARC.


Is there a guideline in which relation L2ARC size should be to RAM?

I could upgrade the server to 8GB, but that's the maximum the i975X 
chipset can handle.


Best Regards,
Felix Buenemann


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Removing Cloned Snapshot

2010-02-12 Thread Mark J Musante

On Fri, 12 Feb 2010, Daniel Carosone wrote:

You can use zfs promote to change around which dataset owns the base 
snapshot, and which is the dependant clone with a parent, so you can 
deletehe other - but if you want both datasets you will need to keep the 
snapshot they share.


Right.  The other option is to zfs send the snapshot to create a copy 
instead of a clone.  Once the zfs recv completes, the snapshot can be 
destroyed.  Of course, it takes much longer to do this, as zfs is going to 
create a full copy of the snapshot.  The appeal of clones is that they, at 
least initially, take no extra space, and also that they're nearly 
instantaneous.  But they require the snapshot to remain for the lifetime 
of the clone.



Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SSD and ZFS

2010-02-12 Thread Andreas Höschler

Hi all,

just after sending a message to sunmanagers I realized that my question 
should rather have gone here. So sunmanagers please excus ethe double 
post:


I have inherited a X4140 (8 SAS slots) and have just setup the system 
with Solaris 10 09. I first setup the system on a mirrored pool over 
the first two disks


  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0s0  ONLINE   0 0 0
c1t1d0s0  ONLINE   0 0 0

errors: No known data errors

and then tried to add the second pair of disks to this pool which did 
not work (famous error message reagding label, root pool BIOS issue). I 
therefore simply created an additional pool tank.


  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0s0  ONLINE   0 0 0
c1t1d0s0  ONLINE   0 0 0

errors: No known data errors

 pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0

errors: No known data errors

So far so good. I have now replaced the last two SAS disks with 32GB 
SSDs and am wondering how to add these to the system. I googled a lot 
for best practise but found nothing so far that made me any wiser. My 
current approach still is to simply do


zpool add tank mirror c0t6d0 c0t7d0

as I would do with normal disks but I am wondering whether that's the 
right approach to significantly increase system performance. Will ZFS 
automatically use these SSDs and optimize accesses to tank? Probably! 
But it won't optimize accesses to rpool of course. Not sure whether I 
need that or should look for that. Should I try to get all disks into 
rpool inspite of the BIOS label issue so that SSDs are used for all 
accesses to the disk system?


Hints (best practises) are greatly appreciated?

Thanks a lot,

 Andreas


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-12 Thread Mickaël Maillot
Hi

Intel X-25 M are MLC not SLC, there are very good for L2ARC.

and next, you need more RAM:
ZFS can't handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS
use memory to allocate and manage L2ARC.

2010/2/10 Felix Buenemann :
> Am 09.02.10 09:58, schrieb Felix Buenemann:
>>
>> Am 09.02.10 02:30, schrieb Bob Friesenhahn:
>>>
>>> On Tue, 9 Feb 2010, Felix Buenemann wrote:

 Well to make things short: Using JBOD + ZFS Striped Mirrors vs.
 controller's RAID10, dropped the max. sequential read I/O from over
 400 MByte/s to below 300 MByte/s. However random I/O and sequential
 writes seemed to perform
>>>
>>> Much of the difference is likely that your controller implements true
>>> RAID10 wereas ZFS "striped" mirrors are actually load-shared mirrors.
>>> Since zfs does not use true striping across vdevs, it relies on
>>> sequential prefetch requests to get the sequential read rate up.
>>> Sometimes zfs's prefetch is not aggressive enough.
>>>
>>> I have observed that there may still be considerably more read
>>> performance available (to another program/thread) even while a benchmark
>>> program is reading sequentially as fast as it can.
>>>
>>> Try running two copies of your benchmark program at once and see what
>>> happens.
>>
>> Yes, JBOD + ZFS load-balanced mirrors does seem to work better under
>> heavy load. I tried rebooting a Windows VM from NFS, which took about 43
>> sec with hot cache in both cases. But when doing this during a bonnie++
>> benchmark run, the ZFS mirrors would win big time, taking just 2:47sec
>> instead of over 4min to reboot the VM.
>> So I think in a real world scenario, the ZFS mirrors will win.
>>
>> On a sitenote however I noticed that small sequential I/O (copying a
>> 150MB sourcetree to NFS), the ZFS mirrors where 50% slower than the
>> controllers RAID10.
>
> I had a hunch that the controllers volume read ahead would interfere with
> the ZFS load-shared mirrors and voilà: sequential reads jumped from 270
> MByte/s to 420 MByte/s, which checks out nicely, because writes are about
> 200 MByte/s.
>
>>
>>> Bob
>>
>> - Felix
>
> - Felix
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot receive new filesystem stream: invalid backup stream

2010-02-12 Thread Andrew Gabriel

Darren J Moffat wrote:

On 12/02/2010 09:55, Andrew Gabriel wrote:

Can anyone suggest how I can get around the above error when
sending/receiving a ZFS filesystem? It seems to fail when about 2/3rds
of the data have been passed from send to recv. Is it possible to get
more diagnostics out?


You could try using /usr/bin/zstreamdump on recent builds.


Ah, thanks Darren, didn't know about that.

As far as I can see, it runs without a problem right to the end of the 
stream. Not sure what I should be looking for in the way of errors, but 
there are no occurrences in the output of any of the error strings 
revealed by running strings(1) on zstreamdump. Without -v ...


zfs send export/h...@20100211  |  zstreamdump  
BEGIN record

   version = 1
   magic = 2f5bacbac
   creation_time = 4b7499c5
   type = 2
   flags = 0x0
   toguid = 435481b4a8c20fbd
   fromguid = 0
   toname = export/h...@20100211
END checksum = 
6797d43d709150c1/e1ec581cbaf0cfa4/7f6d80fa0f23c741/2c2cb821b4a2e639

SUMMARY:
   Total DRR_BEGIN records = 1
   Total DRR_END records = 1
   Total DRR_OBJECT records = 90963
   Total DRR_FREEOBJECTS records = 23389
   Total DRR_WRITE records = 212381
   Total DRR_FREE records = 520856
   Total records = 847591
   Total write size = 17395359232 (0x40cd81e00)
   Total stream length = 17683854968 (0x41e0a3678)


This filesystem has failed in this way for a long time, and I've ignored
it thinking something might get fixed in the future, but this hasn't
happened yet. It's a home directory which has been in existence and used
for about 3 years. One thing is that the pool version (3) and zfs
version (1) are old - could that be the problem? The sending system is
currently running build 125 and receiving system something approximating
to 133, but I've had the same problem with this filesystem for all
builds I've used over the last 2 years.


--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot receive new filesystem stream: invalid backup stream

2010-02-12 Thread Darren J Moffat

On 12/02/2010 09:55, Andrew Gabriel wrote:

Can anyone suggest how I can get around the above error when
sending/receiving a ZFS filesystem? It seems to fail when about 2/3rds
of the data have been passed from send to recv. Is it possible to get
more diagnostics out?


You could try using /usr/bin/zstreamdump on recent builds.


This filesystem has failed in this way for a long time, and I've ignored
it thinking something might get fixed in the future, but this hasn't
happened yet. It's a home directory which has been in existence and used
for about 3 years. One thing is that the pool version (3) and zfs
version (1) are old - could that be the problem? The sending system is
currently running build 125 and receiving system something approximating
to 133, but I've had the same problem with this filesystem for all
builds I've used over the last 2 years.




--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] cannot receive new filesystem stream: invalid backup stream

2010-02-12 Thread Andrew Gabriel
Can anyone suggest how I can get around the above error when 
sending/receiving a ZFS filesystem? It seems to fail when about 2/3rds 
of the data have been passed from send to recv. Is it possible to get 
more diagnostics out?


This filesystem has failed in this way for a long time, and I've ignored 
it thinking something might get fixed in the future, but this hasn't 
happened yet. It's a home directory which has been in existence and used 
for about 3 years. One thing is that the pool version (3) and zfs 
version (1) are old - could that be the problem? The sending system is 
currently running build 125 and receiving system something approximating 
to 133, but I've had the same problem with this filesystem for all 
builds I've used over the last 2 years.


--
Cheers
Andrew Gabriel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs import fails even though all disks are online

2010-02-12 Thread Marc Friesacher
I did some more digging through forum posts and found this:
http://opensolaris.org/jive/thread.jspa?threadID=104654

I ran zdb -l again and saw that even though zpool references /dev/dsk/c4d0 zdb 
shows no labels at that point... the labels are on /dev/dsk/c4d0s0

However, the log label entries match what zpool states i.e. c12t0d0p0.

I have attached the output again.

I'm not sure if any of this is pertinent, but I thought it was strange enough 
to mention.

Cheers,

Marc.
-- 
This message posted from opensolaris.org
LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3

LABEL 0

version=14
name='zedpool'
state=0
txg=1149194
pool_guid=10232199590840258590
hostid=8400371
hostname='vault'
top_guid=9120924193762564033
guid=9165505680810820027
vdev_tree
type='raidz'
id=0
guid=9120924193762564033
nparity=1
metaslab_array=23
metaslab_shift=34
ashift=9
asize=6001127325696
is_log=0
children[0]
type='disk'
id=0
guid=9165505680810820027
path='/dev/dsk/c4d0s0'
devid='id1,c...@ast31500341as=9vs09v48/a'
phys_path='/p...@0,0/pci-...@e/i...@0/c...@0,0:a'
whole_disk=1
DTL=28
children[1]
type='disk'
id=1
guid=13787734166139205988
path='/dev/dsk/c5d0s0'
devid='id1,c...@ast31500341as=9vs0lv3g/a'
phys_path='/p...@0,0/pci-...@e/i...@1/c...@0,0:a'
whole_disk=1
DTL=31
children[2]
type='disk'
id=2
guid=9920804058323031478
path='/dev/dsk/c6d0s0'
devid='id1,c...@ast31500341as=9vs21egh/a'
phys_path='/p...@0,0/pci-...@f/i...@0/c...@0,0:a'
whole_disk=1
DTL=153
children[3]
type='disk'
id=3
guid=15591554387677897236
path='/dev/dsk/c7d0s0'
devid='id1,c...@ast31500341as=9vs21d04/a'
phys_path='/p...@0,0/pci-...@f/i...@1/c...@0,0:a'
whole_disk=1
DTL=27

LABEL 1

version=14
name='zedpool'
state=0
txg=1149194
pool_guid=10232199590840258590
hostid=8400371
hostname='vault'
top_guid=9120924193762564033
guid=9165505680810820027
vdev_tree
type='raidz'
id=0
guid=9120924193762564033
nparity=1
metaslab_array=23
metaslab_shift=34
ashift=9
asize=6001127325696
is_log=0
children[0]
type='disk'
id=0
guid=9165505680810820027
path='/dev/dsk/c4d0s0'
devid='id1,c...@ast31500341as=9vs09v48/a'
phys_path='/p...@0,0/pci-...@e/i...@0/c...@0,0:a'
whole_disk=1
DTL=28
children[1]
type='disk'
id=1
guid=13787734166139205988
path='/dev/dsk/c5d0s0'
devid='id1,c...@ast31500341as=9vs0lv3g/a'
phys_path='/p...@0,0/pci-...@e/i...@1/c...@0,0:a'
whole_disk=1
DTL=31
children[2]
type='disk'
id=2
guid=9920804058323031478
path='/dev/dsk/c6d0s0'
devid='id1,c...@ast31500341as=9vs21egh/a'
phys_path='/p...@0,0/pci-...@f/i...@0/c...@0,0:a'
whole_disk=1
DTL=153
children[3]
type='disk'
id=3
guid=15591554387677897236
path='/dev/dsk/c7d0s0'
devid='id1,c...@ast31500341as=9vs21d04/a'
phys_path='/p...@0,0/pci-...@f/i...@1/c...@0,0:a'
whole_disk=1
DTL=27

LABEL 2

version=14
name='zedpool'
state=0
txg=1149194
pool_guid=10232199590840258590
hostid=8400371
hostname='vault'
top_guid=91209241937625