Re: [zfs-discuss] size of slog device

2010-06-14 Thread Neil Perrin

On 06/14/10 19:35, Erik Trimble wrote:

On 6/14/2010 12:10 PM, Neil Perrin wrote:

On 06/14/10 12:29, Bob Friesenhahn wrote:

On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:


It is good to keep in mind that only small writes go to the dedicated
slog. Large writes to to main store. A succession of that many small
writes (to fill RAM/2) is highly unlikely. Also, that the zil is not
read back unless the system is improperly shut down.


I thought all sync writes, meaning everything NFS and iSCSI, went 
into the slog - IIRC the docs says so.


Check a month or two back in the archives for a post by Matt Ahrens. 
It seems that larger writes (>32k?) are written directly to main 
store.  This is probably a change from the original zfs design.


Bob


If there's a slog then the data, regardless of size, gets written to 
the slog.


If there's no slog and if the data size is greater than 
zfs_immediate_write_sz/zvol_immediate_write_sz
(both default to 32K) then the data is written as a block into the 
pool and the block pointer

written into the log record. This is the WR_INDIRECT write type.

So Matt and Roy are both correct.

But wait, there's more complexity!:

If logbias=throughput is set we always use WR_INDIRECT.

If we just wrote more than 1MB for a single zil commit and there's 
more than 2MB waiting

then we start using the main pool.

Clear as mud?  This is likely to change again...

Neil.



How do I monitor the amount of live (i.e. non-committed) data in the 
slog?  I'd like to spend some time with my setup, seeing exactly how 
much I tend to use.


I think monitoring the capacity when running "zpool iostat -v  1" 
should be fairly accurate.
A simple d script can be written to determine how often the ZIL (code) 
fails to get a slog block and

has to resort to the allocation in the main pool.

One recent change reduced the amount of data written and possibly the 
slog block fragmentation.

This is zpool version 23: "Slim ZIL". So be sure to experiment with that.




I'd suspect that very few use cases call for more than a couple (2-4) 
GB of slog...


I agree this is typically true. Of course it depends on your workload. 
The amount slog data will reflect the
uncommitted synchronous txg data, and the size of each txg will depend 
on memory size.

This area is also undergoing tuning.


I'm trying to get hard numbers as I'm working on building a 
DRAM/battery/flash slog device in one of my friend's electronics 
prototyping shops.  It would be really nice if I could solve 99% of 
the need with 1 or 2 2GB SODIMMs and the chips from a cheap 4GB USB 
thumb drive...




Sounds like fun. Good luck.

Neil.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] size of slog device

2010-06-14 Thread Richard Elling
On Jun 14, 2010, at 6:35 PM, Erik Trimble wrote:
> On 6/14/2010 12:10 PM, Neil Perrin wrote:
>> On 06/14/10 12:29, Bob Friesenhahn wrote:
>>> On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:
>>> 
> It is good to keep in mind that only small writes go to the dedicated
> slog. Large writes to to main store. A succession of that many small
> writes (to fill RAM/2) is highly unlikely. Also, that the zil is not
> read back unless the system is improperly shut down.
 
 I thought all sync writes, meaning everything NFS and iSCSI, went into the 
 slog - IIRC the docs says so.
>>> 
>>> Check a month or two back in the archives for a post by Matt Ahrens. It 
>>> seems that larger writes (>32k?) are written directly to main store.  This 
>>> is probably a change from the original zfs design.
>>> 
>>> Bob
>> 
>> If there's a slog then the data, regardless of size, gets written to the 
>> slog.
>> 
>> If there's no slog and if the data size is greater than 
>> zfs_immediate_write_sz/zvol_immediate_write_sz
>> (both default to 32K) then the data is written as a block into the pool and 
>> the block pointer
>> written into the log record. This is the WR_INDIRECT write type.
>> 
>> So Matt and Roy are both correct.
>> 
>> But wait, there's more complexity!:
>> 
>> If logbias=throughput is set we always use WR_INDIRECT.
>> 
>> If we just wrote more than 1MB for a single zil commit and there's more than 
>> 2MB waiting
>> then we start using the main pool.
>> 
>> Clear as mud?  This is likely to change again...
>> 
>> Neil.
>> 
> 
> How do I monitor the amount of live (i.e. non-committed) data in the slog?  
> I'd like to spend some time with my setup, seeing exactly how much I tend to 
> use.

zilstat
http://www.richardelling.com/Home/scripts-and-programs-1/zilstat

> I'd suspect that very few use cases call for more than a couple (2-4) GB of 
> slog...

I'd suspect few real cases need more than 1GB.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub issues

2010-06-14 Thread George Wilson

Richard Elling wrote:

On Jun 14, 2010, at 2:12 PM, Roy Sigurd Karlsbakk wrote:

Hi all

It seems zfs scrub is taking a big bit out of I/O when running. During a scrub, 
sync I/O, such as NFS and iSCSI is mostly useless. Attaching an SLOG and some 
L2ARC helps this, but still, the problem remains in that the scrub is given 
full priority.


Scrub always runs at the lowest priority. However, priority scheduling only
works before the I/Os enter the disk queue. If you are running Solaris 10 or
older releases with HDD JBODs, then the default zfs_vdev_max_pending 
is 35. This means that your slow disk will have 35 I/Os queued to it before

priority scheduling makes any difference.  Since it is a slow disk, that could
mean 250 to 1500 ms before the high priority I/O reaches the disk.


Is this problem known to the developers? Will it be addressed?


In later OpenSolaris releases, the zfs_vdev_max_pending defaults to 10
which helps.  You can tune it lower as described in the Evil Tuning Guide.

Also, as Robert pointed out, CR 6494473 offers a more resource management
friendly way to limit scrub traffic (b143).  Everyone can buy George a beer for
implementing this change :-)



I'll glad accept any beer donations and others on the ZFS team are happy 
to help consume it. :-)


I look forward to hearing people's experience with the new changes.

- George


Of course, this could mean that on a busy system a scrub that formerly took
a week might now take a month.  And the fix does not directly address the 
tuning of the queue depth issue with HDDs.  TANSTAAFL.

 -- richard




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] size of slog device

2010-06-14 Thread Erik Trimble

On 6/14/2010 12:10 PM, Neil Perrin wrote:

On 06/14/10 12:29, Bob Friesenhahn wrote:

On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:


It is good to keep in mind that only small writes go to the dedicated
slog. Large writes to to main store. A succession of that many small
writes (to fill RAM/2) is highly unlikely. Also, that the zil is not
read back unless the system is improperly shut down.


I thought all sync writes, meaning everything NFS and iSCSI, went 
into the slog - IIRC the docs says so.


Check a month or two back in the archives for a post by Matt Ahrens. 
It seems that larger writes (>32k?) are written directly to main 
store.  This is probably a change from the original zfs design.


Bob


If there's a slog then the data, regardless of size, gets written to 
the slog.


If there's no slog and if the data size is greater than 
zfs_immediate_write_sz/zvol_immediate_write_sz
(both default to 32K) then the data is written as a block into the 
pool and the block pointer

written into the log record. This is the WR_INDIRECT write type.

So Matt and Roy are both correct.

But wait, there's more complexity!:

If logbias=throughput is set we always use WR_INDIRECT.

If we just wrote more than 1MB for a single zil commit and there's 
more than 2MB waiting

then we start using the main pool.

Clear as mud?  This is likely to change again...

Neil.



How do I monitor the amount of live (i.e. non-committed) data in the 
slog?  I'd like to spend some time with my setup, seeing exactly how 
much I tend to use.


I'd suspect that very few use cases call for more than a couple (2-4) GB 
of slog...


I'm trying to get hard numbers as I'm working on building a 
DRAM/battery/flash slog device in one of my friend's electronics 
prototyping shops.  It would be really nice if I could solve 99% of the 
need with 1 or 2 2GB SODIMMs and the chips from a cheap 4GB USB thumb 
drive...


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Native ZFS for Linux

2010-06-14 Thread Peter Jeremy
On 2010-Jun-11 17:41:38 +0800, Joerg Schilling 
 wrote:
>PP.S.: Did you know that FreeBSD _includes_ the GPLd Reiserfs in the FreeBSD 
>kernel since a while and that nobody did complain about this, see e.g.:
>
>http://svn.freebsd.org/base/stable/8/sys/gnu/fs/reiserfs/

That is completely irrelevant and somewhat misleading.  FreeBSD has
never prohibited non-BSD-licensed code in their kernel or userland
however it has always been optional and, AFAIR, the GENERIC kernel has
always defaulted to only contain BSD code.  Non-BSD code (whether GPL
or CDDL) is carefully segregated (note the 'gnu' in the above URI).

-- 
Peter Jeremy


pgpvmgKqx7nJf.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub issues

2010-06-14 Thread Richard Elling
On Jun 14, 2010, at 2:12 PM, Roy Sigurd Karlsbakk wrote:
> Hi all
> 
> It seems zfs scrub is taking a big bit out of I/O when running. During a 
> scrub, sync I/O, such as NFS and iSCSI is mostly useless. Attaching an SLOG 
> and some L2ARC helps this, but still, the problem remains in that the scrub 
> is given full priority.

Scrub always runs at the lowest priority. However, priority scheduling only
works before the I/Os enter the disk queue. If you are running Solaris 10 or
older releases with HDD JBODs, then the default zfs_vdev_max_pending 
is 35. This means that your slow disk will have 35 I/Os queued to it before
priority scheduling makes any difference.  Since it is a slow disk, that could
mean 250 to 1500 ms before the high priority I/O reaches the disk.

> Is this problem known to the developers? Will it be addressed?

In later OpenSolaris releases, the zfs_vdev_max_pending defaults to 10
which helps.  You can tune it lower as described in the Evil Tuning Guide.

Also, as Robert pointed out, CR 6494473 offers a more resource management
friendly way to limit scrub traffic (b143).  Everyone can buy George a beer for
implementing this change :-)

Of course, this could mean that on a busy system a scrub that formerly took
a week might now take a month.  And the fix does not directly address the 
tuning of the queue depth issue with HDDs.  TANSTAAFL.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Data Loss on system crash/upgrade

2010-06-14 Thread Cindy Swearingen
> Hello all,
> 
> I've been running OpenSolaris on my personal
> fileserver for about a year and a half, and it's been
> rock solid except for having to upgrade from 2009.06
> to a dev version to fix some network driver issues.
> About a month ago, the motherboard on this computer
> died, and I upgraded to a better motherboard and
> processor.  This move broke the OS install, and
> instead of bothering to try to figure out how to fix
> it, I decided on a reinstall.  All my important data
> (including all my virtual hard drives) are stored on
>  a separate 3 disk raidz pool.  
> 
> In attempting to import the pool, I realized that I
> had upgraded the zpool to a newer version than is
> supported in the live CD, so I installed the latest
> dev release to allow the filesystem to mount.  After
> mounting the drives (with a zpool import -f), I
> noticed that some files might be missing.  After
> installing virtualbox and booting up a WinXP VM, this
> issue was confirmed.
> 
> Files before 2/10/2010 seem to be unharmed, but the
> next file I have logged on 2/19/2010 is missing.
> Every file created after this date is also missing.
> The machine had been rebooted several times before
> the crash with no issues.  For the week or so prior
> to the machine finally dying for good, it would
> boot, last a few hours, and then crash.  These files
>  were fine during that period.
> 
> One more thing of note:  when the machine suffered
> critical hardware failure, the zpool in issue was at
> about 95% full.  When I upgraded to new hardware
> (after updating the machine), I added two mirrored
> disks to the pool to alleviate the space issue until
> I could back everything up, destroy the pool, and
> recreate it with six disks instead of three.
> 
> Is this a known bug with a fix, or am I out of luck
> with these files?
> 
> Thanks,
> Austin

Austin,

If your raidz pool  with important data was damaged in some way by the
hardware failures, then a recovery mechanism in recent builds is to discard
the last few transactions to get the pool back to a good known state.
You would have seen messages regarding this recovery. 

We might be able to see if this recovery happened if you could provide 
your zpool history output for this pool.

Generally, a few seconds of data transactions are lost, not all of the data
after a certain date.

Another issue is that VirtualBox doesn't honor cache flushes by default
so if the system is crashing with data in play, your data might not be
safely written to disk.

Thanks,

Cindy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub issues

2010-06-14 Thread Robert Milkowski

On 14/06/2010 22:12, Roy Sigurd Karlsbakk wrote:

Hi all

It seems zfs scrub is taking a big bit out of I/O when running. During a scrub, 
sync I/O, such as NFS and iSCSI is mostly useless. Attaching an SLOG and some 
L2ARC helps this, but still, the problem remains in that the scrub is given 
full priority.

Is this problem known to the developers? Will it be addressed?

   


http://sparcv9.blogspot.com/2010/06/slower-zfs-scrubsresilver-on-way.html
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473

--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Scrub issues

2010-06-14 Thread Roy Sigurd Karlsbakk
Hi all

It seems zfs scrub is taking a big bit out of I/O when running. During a scrub, 
sync I/O, such as NFS and iSCSI is mostly useless. Attaching an SLOG and some 
L2ARC helps this, but still, the problem remains in that the scrub is given 
full priority.

Is this problem known to the developers? Will it be addressed?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] COMSTAR dropouts with dedup enabled

2010-06-14 Thread Brandon High
On Mon, Jun 14, 2010 at 1:35 PM, Brandon High  wrote:
> How much memory do you have, and how big is the DDT? You can get the
> DDT size with 'zdb -DD'. The total count is the sum of duplicate and
> unique entries. Each entry uses ~ 250 bytes per entry, so the count
> divided by 4 is a (very rough) estimate of the memory size of the DDT
> in kilobytes.

One more thing: The default block size is 8k for zvols, which means
that the DDT will grow much faster than for filesystem datasets.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] COMSTAR dropouts with dedup enabled

2010-06-14 Thread Brandon High
On Sun, Jun 13, 2010 at 6:58 PM, Matthew Anderson
 wrote:
> The problem didn’t seem to occur with only a small amount of data on the LUN
> (<50GB) and happened more frequently as the LUN filled up. I’ve since moved
> all data to non-dedup LUN’s and I haven’t seen a dropout for over a month.

How much memory do you have, and how big is the DDT? You can get the
DDT size with 'zdb -DD'. The total count is the sum of duplicate and
unique entries. Each entry uses ~ 250 bytes per entry, so the count
divided by 4 is a (very rough) estimate of the memory size of the DDT
in kilobytes.

The most likely case is that you don't have enough memory to hold the
entire dedup table in the ARC. When this happens, the host has to read
entries from the main pool, which is slow.

If you want to continue running with dedup, adding a L2ARC may help
since the DDT can be held in the faster cache. Disabling dedup for the
dataset will give you good write performance too.

Be aware that destroying snapshots from this dataset (or destroying
the dataset itself) is likely to create dropouts as well, since the
DDT needs to be scanned to see if a block can be dereferenced. Again,
adding L2ARC may help.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] size of slog device

2010-06-14 Thread Neil Perrin

On 06/14/10 12:29, Bob Friesenhahn wrote:

On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:


It is good to keep in mind that only small writes go to the dedicated
slog. Large writes to to main store. A succession of that many small
writes (to fill RAM/2) is highly unlikely. Also, that the zil is not
read back unless the system is improperly shut down.


I thought all sync writes, meaning everything NFS and iSCSI, went 
into the slog - IIRC the docs says so.


Check a month or two back in the archives for a post by Matt Ahrens. 
It seems that larger writes (>32k?) are written directly to main 
store.  This is probably a change from the original zfs design.


Bob


If there's a slog then the data, regardless of size, gets written to the 
slog.


If there's no slog and if the data size is greater than 
zfs_immediate_write_sz/zvol_immediate_write_sz
(both default to 32K) then the data is written as a block into the pool 
and the block pointer

written into the log record. This is the WR_INDIRECT write type.

So Matt and Roy are both correct.

But wait, there's more complexity!:

If logbias=throughput is set we always use WR_INDIRECT.

If we just wrote more than 1MB for a single zil commit and there's more 
than 2MB waiting

then we start using the main pool.

Clear as mud?  This is likely to change again...

Neil.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sync Write - ZIL log performance - Feedback for ZFS developers?

2010-06-14 Thread Roy Sigurd Karlsbakk





On 04/10/10 09:28, Edward Ned Harvey wrote: 

- If synchronous writes are large (>32K) and block aligned then the blocks are 
written directly to the pool and a small record 
written to the log. Later when the txg commits then the blocks are just linked 
into the txg. However, this processing is not 
done if there are any slogs because I found it didn't perform as well. Probably 
ought to be re-evaluated. 
Won't this affect NFS/iSCSI performance pretty badly where the ZIL is crucial? 

Vennlige hilsener / Best regards 

roy 
-- 
Roy Sigurd Karlsbakk 
(+47) 97542685 
r...@karlsbakk.net 
http://blogg.karlsbakk.net/ 
-- 
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Permament errors in "files" <0x0>

2010-06-14 Thread Jan Ploski
I've been referred to here from the zfs-fuse newsgroup. I have a 
(non-redundant) pool which is reporting errors that I don't quite understand:

# zpool status -v
  pool: green
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 1h12m, 2.96% done, 39h44m to go
config:

NAMESTATE READ WRITE CKSUM
green   ONLINE   0 0 2
  disk/by-id/dm-name-green  ONLINE   0 0 4

errors: Permanent errors have been detected in the following files:

:<0x0>
green:<0x0>

I read the explanations at 
http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.html#gbcuz that the 0x0 is 
output when a file path is not available, but I'm still unsure how to proceed 
(of course, I'd also like to know why these errors occurred in the first place 
- after just a couple of days of using zfs-fuse, but that's another story).

It has been suggested to me to copy out all data from the pool and/or recreate 
it from backup, but do I really have to (hours of recovery), or is there a 
faster way to correct the problem? Apart from these alarming messages, the pool 
seems to be in working order, e.g. all files that I tried could be read. I 
guess I'd just like to know [i]what[/i] the corrupted data is and the 
implications.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] size of slog device

2010-06-14 Thread Bob Friesenhahn

On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:


It is good to keep in mind that only small writes go to the dedicated
slog. Large writes to to main store. A succession of that many small
writes (to fill RAM/2) is highly unlikely. Also, that the zil is not
read back unless the system is improperly shut down.


I thought all sync writes, meaning everything NFS and iSCSI, went 
into the slog - IIRC the docs says so.


Check a month or two back in the archives for a post by Matt Ahrens. 
It seems that larger writes (>32k?) are written directly to main 
store.  This is probably a change from the original zfs design.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] COMSTAR dropouts with dedup enabled

2010-06-14 Thread Matthew Anderson
Hi All,

I currently use b134 and COMSTAR to deploy SRP targets for virtual machine 
storage (VMware ESXi4) and have run into some unusual behaviour when dedup is 
enabled for a particular LUN. The target seems to lock up (ESX reports it as 
unavailable) when writing large amount or overwriting data, reads are 
unaffected. The easiest way for me to replicate the problem was to restore a 
2GB SQL database inside a VM. The dropouts lasted anywhere from 3 seconds to a 
few minutes and when connectivity is restored the other LUN's (without dedup) 
drop out for a few seconds.

The problem didn't seem to occur with only a small amount of data on the LUN 
(<50GB) and happened more frequently as the LUN filled up. I've since moved all 
data to non-dedup LUN's and I haven't seen a dropout for over a month.  Does 
anyone know why this is happening? I've also seen the behaviour when exporting 
iSCSI targets with COMSTAR. I haven't had a chance to install the SSD's for 
L2ARC and SLOG yet so I'm unsure if that will help the issue.

System specs are-
Single Xeon 5620
24GB DDR3
24x 1.5TB 7200rpm
LSI RAID card

Thanks
-Matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] size of slog device

2010-06-14 Thread Roy Sigurd Karlsbakk
- Original Message -
> On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:
> 
> >> There is absolutely no sense in having slog devices larger than
> >> then main memory, because it will never be used, right?
> >> ZFS will rather flush the txg to disk than reading back from
> >> zil? So there is a guideline to have enough slog to hold about 10
> >> seconds of zil, but the absolute maximum value is the size of
> >> main memory. Is this correct?
> >
> > ZFS uses at most RAM/2 for ZIL
> 
> It is good to keep in mind that only small writes go to the dedicated
> slog. Large writes to to main store. A succession of that many small
> writes (to fill RAM/2) is highly unlikely. Also, that the zil is not
> read back unless the system is improperly shut down.

I thought all sync writes, meaning everything NFS and iSCSI, went into the slog 
- IIRC the docs says so.
 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Unable to Install 2009.06 on BigAdmin Approved MOBO - FILE SYSTEM FULL

2010-06-14 Thread Cindy Swearingen

Hi Giovanni,

My Monday morning guess is that the disk/partition/slices are not
optimal for the installation.

Can you provide the partition table on the disk that you are attempting 
to install? Use format-->disk-->partition-->print.


You want to put all the disk space in c*t*d*s0. See this section of the 
ZFS troubleshooting guide for an example of fixing the disk/partition/

slice issues:

http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide

Replacing/Relabeling the Root Pool Disk

Thanks,

Cindy


On 06/13/10 17:42, Giovanni wrote:

Hi Guys

I am having trouble installing Opensolaris 2009.06 into my Biostar Tpower I45 motherboard, approved on BigAdmin HCL here: 


http://www.sun.com/bigadmin/hcl/data/systems/details/26409.html -- why is it 
not working?

My setup:
3x 1TB hard-drives SATA 
1x 500GB hard-drive (I have only left this hdd connected to try to isolate the issue, still happens)

4GB DDR2 PC2-6400 Ram (tested GOOD!)
ATI Radeon 4650 512MB DDR2 PCI-E 16x
Motherboard default settings/CMOS cleared

Here's what happens: Opensolaris boot options come up, I choose the first default 
"OPensolaris 2009.06" -- I HAVE ALSO TRIED VESA DRIVES and Command line, all of 
these fail.
-

After Select desktop language, 

configuring devices. 
Mounting cdroms 
Reading ZFS Config: done.


opensolaris console login: (cd rom is still being accessed at this time).. few 
seconds later:

then opensolaris ufs: NOTICE: alloc: /: file system full
opensolaris last message repeated 1 time
opensolaris syslogd: /var/adm/messages: No space left on device
opensolaris in.routed[537]: route 0.0.0.0/8 -> 0.0.0.0 nexthop is not directly 
connected

---

I logged in as jack / jack on the console and did a df -h

/devices/ramdisk:a = size 164M 100% used mount /
swap 3.3GB used 860K 1%
/mnt/misc/opt 210MB used 210M 100% /mnt/misc

/usr/lib/libc/libc_hwcap1.so.1 2.3G used 2.3G 100% /lib/libc.so.1

/dev/dsk/c7t0d0s2 677M used 677M 100% /media/OpenSolaris

Thanks for any help!

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] size of slog device

2010-06-14 Thread Bob Friesenhahn

On Mon, 14 Jun 2010, Roy Sigurd Karlsbakk wrote:


There is absolutely no sense in having slog devices larger than
then main memory, because it will never be used, right?
ZFS will rather flush the txg to disk than reading back from
zil? So there is a guideline to have enough slog to hold about 10
seconds of zil, but the absolute maximum value is the size of
main memory. Is this correct?


ZFS uses at most RAM/2 for ZIL


It is good to keep in mind that only small writes go to the dedicated 
slog.  Large writes to to main store.  A succession of that many small 
writes (to fill RAM/2) is highly unlikely.  Also, that the zil is not 
read back unless the system is improperly shut down.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Moved disks to new controller - cannot import pool even after moving ba

2010-06-14 Thread Ross Walker
On Jun 13, 2010, at 2:14 PM, Jan Hellevik  
 wrote:


Well, for me it was a cure. Nothing else I tried got the pool back.  
As far as I can tell, the way to get it back should be to use  
symlinks to the fdisk partitions on my SSD, but that did not work  
for me. Using -V got the pool back. What is wrong with that?


If you have a better suggestion as to how I should have recovered my  
pool I am certainly interested in hearing it.


I would take this time to offline one disk at a time, wipe all it's  
tables/labels and re-attach it as an EFI whole disk to avoid hitting  
this same problem again in the future.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] size of slog device

2010-06-14 Thread Arne Jansen
Roy Sigurd Karlsbakk wrote:
>> There is absolutely no sense in having slog devices larger than
>> then main memory, because it will never be used, right?
>> ZFS will rather flush the txg to disk than reading back from
>> zil? So there is a guideline to have enough slog to hold about 10
>> seconds of zil, but the absolute maximum value is the size of
>> main memory. Is this correct?
> 
> ZFS uses at most RAM/2 for ZIL

Thanks!

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] size of slog device

2010-06-14 Thread Arne Jansen
Edward Ned Harvey wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Arne Jansen
>>
>> There is absolutely no sense in having slog devices larger than
>> then main memory, because it will never be used, right?
> 
> Also:  A TXG is guaranteed to flush within 30 sec.  Let's suppose you have a
> super fast device, which is able to log 8Gbit/sec (which is unrealistic).
> That's 1Gbyte/sec, unrealistically theoretically possible, at best.  You do
> the math.  ;-)
> 
> That being said, it's difficult to buy an SSD smaller than 32G.  So what are
> you going to do?

I'm still building my rotational write delay eliminating driver and am trying
to figure out how much space I can waste on the underlying device without ever
running into problems. I need half the physical memory, or, under the assumption
that it might be tunable, a maximum of my physical memory. It's good to know
a hard upper limit. The more I can waste, the faster the device will be.

Also, to stay in your line of argumentation, this super-fast slog is most
probably a DRAM-based, battery backed solution. In this case it will make
a difference if you buy 8 or 32GB ;)

--Arne
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] size of slog device

2010-06-14 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Arne Jansen
> 
> There is absolutely no sense in having slog devices larger than
> then main memory, because it will never be used, right?

Also:  A TXG is guaranteed to flush within 30 sec.  Let's suppose you have a
super fast device, which is able to log 8Gbit/sec (which is unrealistic).
That's 1Gbyte/sec, unrealistically theoretically possible, at best.  You do
the math.  ;-)

That being said, it's difficult to buy an SSD smaller than 32G.  So what are
you going to do?  Slice it and use the remaining space for cache?  Some
people do.  Some people may even get a performance benefit by doing so.  But
if you do, now you've got a cache and a log both competing for IO on the
same device.  The performance benefit degrades for sure.

My advice is to simply acknowledge wasted space in your log device, forget
about it and move on.  Same thing you did with all the wasted space on your
mirrored OS boot device, which can't (or shouldn't) be used by your data
pool.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup performance hit

2010-06-14 Thread remi.urbillac
 
>>
> To add such a device, you would do:
> 'zpool add tank mycachedevice'
>
>

Hi

Correct me if I'm wrong, but for me the good command should be : 
'zpool add tank cache mycachedevice'

If you don't use the "cache" keyword, the device would be added as a classical 
top level vdev.

Remi
*
This message and any attachments (the "message") are confidential and intended 
solely for the addressees. 
Any unauthorised use or dissemination is prohibited.
Messages are susceptible to alteration. 
France Telecom Group shall not be liable for the message if altered, changed or 
falsified.
If you are not the intended addressee of this message, please cancel it 
immediately and inform the sender.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup performance hit

2010-06-14 Thread Dennis Clarke

>>
> You are severely RAM limited.  In order to do dedup, ZFS has to maintain
> a catalog of every single block it writes and the checksum for that
> block. This is called the Dedup Table (DDT for short).
>
> So, during the copy, ZFS has to (a) read a block from the old
> filesystem, (b) check the current DDT to see if that block exists and
> (c) either write the block to the new filesytem (and add an appropriate
> DDT entry for it), or write a metadata update with the dedup reference
> block reference.
>
> Likely, you have two problems:
>
> (1) I suspect your source filesystem has lots of blocks (that is, it's
> likely made up smaller-sized files).  Lots of blocks means lots of
> seeking back and forth to read all those blocks.
>
> (2) Lots of blocks also means lots of entries in the DDT.  It's trivial
> to overwhelm a 4GB system with a large DDT.  If the DDT can't fit in
> RAM, then it has to get partially refreshed from disk.
>
> Thus, here's what's likely going on:
>
> (1)  ZFS reads a block and it's checksum from the old filesystem
> (2)  it checks the DDT to see if that checksum exists
> (3) finding that the entire DDT isn't resident in RAM, it starts a cycle
> to read the rest of the (potential) entries from the new filesystems'
> metadata.  That is, it tries to reconstruct the DDT from disk.  Which
> involves a HUGE amount of random seek reads on the new filesystem.
>
> In essence, since you likely can't fit the DDT in RAM, each block read
> from the old filesystem forces a flurry of reads from the new
> filesystem. Which eats up the IOPS that your single pool can provide.
> It thrashes the disks.  Your solution is to either buy more RAM, or find
> something you can use as an L2ARC cache device for your pool.  Ideally,
> it would be an SSD.  However, in this case, a plain hard drive would do
> OK (NOT one already in a pool).To add such a device, you would do:
> 'zpool add tank mycachedevice'
>
>

That was an awesome response!  Thank you for that :-)
I tend to config my servers with 16G of ram minimum these days and now I
know why.


-- 
Dennis Clarke
dcla...@opensolaris.ca  <- Email related to the open source Solaris
dcla...@blastwave.org   <- Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] size of slog device

2010-06-14 Thread Roy Sigurd Karlsbakk
> There is absolutely no sense in having slog devices larger than
> then main memory, because it will never be used, right?
> ZFS will rather flush the txg to disk than reading back from
> zil? So there is a guideline to have enough slog to hold about 10
> seconds of zil, but the absolute maximum value is the size of
> main memory. Is this correct?

ZFS uses at most RAM/2 for ZIL

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] size of slog device

2010-06-14 Thread Thomas Burgess
On Mon, Jun 14, 2010 at 4:41 AM, Arne Jansen  wrote:

> Hi,
>
> I known it's been discussed here more than once, and I read the
> Evil tuning guide, but I didn't find a definitive statement:
>
> There is absolutely no sense in having slog devices larger than
> then main memory, because it will never be used, right?
> ZFS will rather flush the txg to disk than reading back from
> zil?
> So there is a guideline to have enough slog to hold about 10
> seconds of zil, but the absolute maximum value is the size of
> main memory. Is this correct?
>
>


I thought it was half the size of memory.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] size of slog device

2010-06-14 Thread Arne Jansen
Hi,

I known it's been discussed here more than once, and I read the
Evil tuning guide, but I didn't find a definitive statement:

There is absolutely no sense in having slog devices larger than
then main memory, because it will never be used, right?
ZFS will rather flush the txg to disk than reading back from
zil?
So there is a guideline to have enough slog to hold about 10
seconds of zil, but the absolute maximum value is the size of
main memory. Is this correct?

Thanks,
Arne
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What happens when unmirrored ZIL log devi ce is removed ungracefully

2010-06-14 Thread R . Eulenberg
Hello
I even have this problem on my system. I lost my backup server crashing the
system-hd and the ZIL-device. After setting up a new system (osol 2009.06 and
updating to the latest osol/dev version with zpool-dedup) I tried to import my
backup pool, but I can't. The system tells me there isn't any zpool tank1 trying
to replace / detache / attach / add any kind of device or answers this:
zpool import -f tank1
cannot import 'tank1': one or more devices is currently unavailable
Destroy and re-create the pool from
a backup source.
While using options -F, -X, -V, -C, -D and any combination of them the same
reaction comes from the system.
There are some solutions for problems in which the old cachefile is available or
the ZIL-device isn't destroyed, but for my case there isn't  anyone. 
I need a way importing the zpool by ignoring the ZIL-device.
I spend a week searching in the net, but didn't found something.
For some help I would be very glad.

regards 
Ronny
P.S. hoping you excuse my lousy English.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs/zpool] hang at boot

2010-06-14 Thread schatten
Just FYI.
The error was that I created the ZFS at the wrong pool.

rpool/a/b/c
rpool/new

I mounted "new" in a directory of rpoo/ "c". Seems like this hierarchical 
mounting is not working like I thought. ;)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshots, txgs and performance

2010-06-14 Thread Arne Jansen
Marcelo Leal wrote:
> Hello there,
>  I think you should share it with the list, if you can, seems like an 
> interesting work. ZFS has some issues with snapshots and spa_sync performance 
> for snapshots deletion.

I'm a bit reluctant to post it to the list where it can still be found
years from now. Because the module is not compiled directly into ZFS
but is a separate module that makes heavy use of internal structures
of ZFS, it is designed for a specific version of ZFS (Solaris U8). It
might still load without problems for years, but already in the next
Solaris version it might wreak havoc because of a changed kernel structure.
A much better way would be to have a similar operation integrated into
the official source tree. I could try to build a patch if it has a
chance of getting accepted.

Until then, I have no problem with sharing it off-list.

--Arne

>  
>  Thanks
> 
>  Leal
> [ http://www.eall.com.br/blog ]

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss