Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-14 Thread can you guess?
...

  But every block so rearranged
> (and every tree ancestor of each such block) would
> then leave an equal-sized residue in the most recent
> snapshot if one existed, which gets expensive fast in
> terms of snapshot space overhead (which then is
> proportional to the amount of reorganization
> performed as well as to the amount of actual data
> updating).

Actually, it's not *quite* as bad as that, since the common parent block of 
multiple children should appear only once in the snapshot, not once for each 
child moved.

Still, it does drive up snapshot overhead, and if you start trying to use 
snapshots to simulate 'continuous data protection' rather than more sparingly 
the problem becomes more significant (because each snapshot will catch any 
background defragmentation activity at a different point, such that common 
parent blocks may appear in more than one snapshot even if no child data has 
actually been updated).  Once you introduce CDP into the process (and it's 
tempting to, since the file system is in a better position to handle it 
efficiently than some add-on product), rethinking how one approaches snapshots 
(and COW in general) starts to make more sense.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] internal error: Bad file number

2007-11-14 Thread Manoj Nayak
Hi ,

I am using s10u3 in x64 AMD Opteron thumper.

Thanks
Manoj Nayak

Manoj Nayak wrote:
> Hi ,
>
> I am getting following error message when I run any zfs command.I have 
> attach the script I use to create ramdisk image for Thumper.
>
> # zfs volinit
> internal error: Bad file number
> Abort - core dumped
>
> # zpool status
> internal error: Bad file number
> Abort - core dumped
> #
> # zfs list
> internal error: Bad file number
> Abort - core dumped
> #
>
> Thanks
> Manoj Nayak
>
>
>
> 
>
> #!/bin/ksh
>
> # This script generates Solaris ramdisk image for works nodes
>
> PKGADD=/usr/sbin/pkgadd
> PKGLOG=/tmp/packages.log
> PKGADMIN=/tmp/pkgadmin
> ROOTDIR=/tmp/miniroot
> OPTDIR=$ROOTDIR/opt
> HOMEDIR=$ROOTDIR/home/kealia
> USRDIR=$ROOTDIR/usr/local
>
> #/net/ns1/export/OS_images/s10u3/x/latest/Solaris_10/Product/
> PROD=../pkgdb/
> PROD_OVERRIDE=/myworkspace/packages/i386/nightly
>
> NODE=$1
> BOXNAME=""
>
> #
> # Minimum list of packages that boots to login prompt on text console.
> # Add additional packages to get more functionality (e.g. add SUNWmdbr
> # for kernel debugging via kmdb).
> #
> COMMON_PKGLIST="
> SUNWcar.i
> SUNWcakr.i
> SUNWkvm.i
> SUNWcsr
> SUNWcsd
> SUNWos86r
> SUNWrmodr
> SUNWpsdcr
> SUNWpsdir
> SUNWckr
> SUNWcnetr
> SUNWcsl
> SUNWcsu
> SUNWcslr
> SUNWesu
> SUNWkey
> SUNWlibms
> SUNWlibmsr
> SUNWusb
> SUNWpr
> SUNWtls
> SUNWlibsasl
> SUNWlxml
>   SUNWlibpopt
> SUNWopenssl-libraries
> SUNWusbs
> SUNWmdr
> SUNWmdu
> SUNWtecla
> SUNWzlib
> SUNWuprl
> SUNWsmapi
> SUNWkrbr
> SUNWkrbu
> SUNWtnetr
> SUNWtnetd
> SUNWgss
> SUNWbipr
> SUNWbip
> SUNWintgige
> SUNWnge
> SUNWbash
> SUNWrcmds
> SUNWrcmdc
> SUNWrcmdr
> SUNWpkgcmdsu
> SUNWwbsup
>   SUNWsshcu
>   SUNWtoo
>   SUNWxcu4
>   SUNWsshdr
>   SUNWsshdu
>   SUNWsshr
>   SUNWsshu
>   SFWrpm
>   SMCncurs
>   SSBinutils
>   SSCoreutils
>   SSGcc
>   SSTcl
>   SUNWbzip
>   SSlibiconv
>   SUNWrmodu
>   SUNWntpr
>   SUNWntpu
> "
>
> case $NODE in
> mstor)
> PKGLIST="${COMMON_PKGLIST} SUNWixgb SUNWmv88sx SUNWzfsu SUNWzfsr 
> SUNWhd"
> BOXNAME="StreamStor"
> ;;
> *)
> NODE="mworks"
> PKGLIST=${COMMON_PKGLIST}
> BOXNAME="StreamWORKS"
> ;;
> esac
>
> #
> # Create a pkg admin file - see man admin(4)
> #
> sed 's/ask/nocheck/' /var/sadm/install/admin/default > $PKGADMIN
>
> echo "adding packages to $ROOTDIR"
>
> [ -d $ROOTDIR ] && rm -fr $ROOTDIR
> mkdir -p $ROOTDIR
> mkdir $OPTDIR
> mkdir -p $OPTDIR/kealia/bin
> mkdir -p $OPTDIR/kealia/etc
> mkdir -p $HOMEDIR
> mkdir -p $USRDIR
>
> for pkg in $PKGLIST; do
> if [ -d "$PROD_OVERRIDE/$pkg" ]; then
> echo "  $pkg added from $PROD_OVERRIDE"
> $PKGADD -a $PKGADMIN -d $PROD_OVERRIDE -R $ROOTDIR $pkg \
> > $pkg.PKGLOG 2>&1
> elif [ -d "$PROD/$pkg" ]; then
> echo "  $pkg added from $PROD"
> $PKGADD -a $PKGADMIN -d $PROD -R $ROOTDIR $pkg \
> > $PKGLOG 2>&1
> else
> echo "  $pkg not found: skipped"
> fi
> done
>
> #
> # Strip amd64 binaries
> #
> echo "strip amd64 binaries"
> (cd $ROOTDIR; find . -name amd64 | xargs rm -r 2> /dev/null)
>
> #
> # remove packaging, xpg4, sfw
> #
> # echo "strip packaging, xpg4, and freeware"
> # (cd $ROOTDIR; rm -r var/sadm/* usr/xpg4 usr/sfw)
>
> #
> # Fix up the image so it boot to login prompt
> #
> echo "fix /etc/vfstab, /etc/nodename, and /etc/hosts"
> echo "/devices/ramdisk:a - / ufs - no nologging" >> $ROOTDIR/etc/vfstab
>
> #create the file to enable dhcp
> if [ $NODE = "mstor" ]; then
> touch $ROOTDIR/etc/dhcp.e1000g0
> else
> touch $ROOTDIR/etc/dhcp.nge0
> fi
>
> echo "127.0.0.1 localhost loghost" > $ROOTDIR/etc/hosts
> echo "setprop console 'text'\n" >> $ROOTDIR/boot/solaris/bootenv.rc
> #
> # Set the environment variables for svccfg.
> #
>
> #
> echo "import SMF services"
> SVC_FILES=`find $ROOTDIR/var/svc/manifest -name "*.xml"`
> SVCCFG_DTD=$ROOTDIR/usr/share/lib/xml/dtd/service_bundle.dtd.1
> SVCCFG_REPOSITORY=$ROOTDIR/etc/svc/repository.db
> SVCCFG=/usr/sbin/svccfg
>
> export SVCCFG_DTD SVCCFG_REPOSITORY SVCCFG
>
>
> $SVCCFG import $ROOTDIR/var/svc/manifest/network/network-initial.xml
> $SVCCFG import $ROOTDIR/var/svc/manifest/network/network-service.xml
> $SVCCFG import $ROOTDIR/var/svc/manifest/milestone/sysconfig.xml
> $SVCCFG import $ROOTDIR/var/svc/manifest/system/system-log.xml
> $SVCCFG import $ROOTDIR/var/svc/manifest/netwo

[zfs-discuss] internal error: Bad file number

2007-11-14 Thread Manoj Nayak

Hi ,

I am getting following error message when I run any zfs command.I have 
attach the script I use to create ramdisk image for Thumper.


# zfs volinit
internal error: Bad file number
Abort - core dumped

# zpool status
internal error: Bad file number
Abort - core dumped
#
# zfs list
internal error: Bad file number
Abort - core dumped
#

Thanks
Manoj Nayak



#!/bin/ksh

# This script generates Solaris ramdisk image for works nodes

PKGADD=/usr/sbin/pkgadd
PKGLOG=/tmp/packages.log
PKGADMIN=/tmp/pkgadmin
ROOTDIR=/tmp/miniroot
OPTDIR=$ROOTDIR/opt
HOMEDIR=$ROOTDIR/home/kealia
USRDIR=$ROOTDIR/usr/local

#/net/ns1/export/OS_images/s10u3/x/latest/Solaris_10/Product/
PROD=../pkgdb/
PROD_OVERRIDE=/myworkspace/packages/i386/nightly

NODE=$1
BOXNAME=""

#
# Minimum list of packages that boots to login prompt on text console.
# Add additional packages to get more functionality (e.g. add SUNWmdbr
# for kernel debugging via kmdb).
#
COMMON_PKGLIST="
SUNWcar.i
SUNWcakr.i
SUNWkvm.i
SUNWcsr
SUNWcsd
SUNWos86r
SUNWrmodr
SUNWpsdcr
SUNWpsdir
SUNWckr
SUNWcnetr
SUNWcsl
SUNWcsu
SUNWcslr
SUNWesu
SUNWkey
SUNWlibms
SUNWlibmsr
SUNWusb
SUNWpr
SUNWtls
SUNWlibsasl
SUNWlxml
SUNWlibpopt
SUNWopenssl-libraries
SUNWusbs
SUNWmdr
SUNWmdu
SUNWtecla
SUNWzlib
SUNWuprl
SUNWsmapi
SUNWkrbr
SUNWkrbu
SUNWtnetr
SUNWtnetd
SUNWgss
SUNWbipr
SUNWbip
SUNWintgige
SUNWnge
SUNWbash
SUNWrcmds
SUNWrcmdc
SUNWrcmdr
SUNWpkgcmdsu
SUNWwbsup
SUNWsshcu
SUNWtoo
SUNWxcu4
SUNWsshdr
SUNWsshdu
SUNWsshr
SUNWsshu
SFWrpm
SMCncurs
SSBinutils
SSCoreutils
SSGcc
SSTcl
SUNWbzip
SSlibiconv
SUNWrmodu
SUNWntpr
SUNWntpu
"

case $NODE in
mstor)
PKGLIST="${COMMON_PKGLIST} SUNWixgb SUNWmv88sx SUNWzfsu SUNWzfsr SUNWhd"
BOXNAME="StreamStor"
;;
*)
NODE="mworks"
PKGLIST=${COMMON_PKGLIST}
BOXNAME="StreamWORKS"
;;
esac

#
# Create a pkg admin file - see man admin(4)
#
sed 's/ask/nocheck/' /var/sadm/install/admin/default > $PKGADMIN

echo "adding packages to $ROOTDIR"

[ -d $ROOTDIR ] && rm -fr $ROOTDIR
mkdir -p $ROOTDIR
mkdir $OPTDIR
mkdir -p $OPTDIR/kealia/bin
mkdir -p $OPTDIR/kealia/etc
mkdir -p $HOMEDIR
mkdir -p $USRDIR

for pkg in $PKGLIST; do
if [ -d "$PROD_OVERRIDE/$pkg" ]; then
echo "  $pkg added from $PROD_OVERRIDE"
$PKGADD -a $PKGADMIN -d $PROD_OVERRIDE -R $ROOTDIR $pkg \
> $pkg.PKGLOG 2>&1
elif [ -d "$PROD/$pkg" ]; then
echo "  $pkg added from $PROD"
$PKGADD -a $PKGADMIN -d $PROD -R $ROOTDIR $pkg \
> $PKGLOG 2>&1
else
echo "  $pkg not found: skipped"
fi
done

#
# Strip amd64 binaries
#
echo "strip amd64 binaries"
(cd $ROOTDIR; find . -name amd64 | xargs rm -r 2> /dev/null)

#
# remove packaging, xpg4, sfw
#
# echo "strip packaging, xpg4, and freeware"
# (cd $ROOTDIR; rm -r var/sadm/* usr/xpg4 usr/sfw)

#
# Fix up the image so it boot to login prompt
#
echo "fix /etc/vfstab, /etc/nodename, and /etc/hosts"
echo "/devices/ramdisk:a - / ufs - no nologging" >> $ROOTDIR/etc/vfstab

#create the file to enable dhcp
if [ $NODE = "mstor" ]; then
touch $ROOTDIR/etc/dhcp.e1000g0
else
touch $ROOTDIR/etc/dhcp.nge0
fi

echo "127.0.0.1 localhost loghost" > $ROOTDIR/etc/hosts
echo "setprop console 'text'\n" >> $ROOTDIR/boot/solaris/bootenv.rc
#
# Set the environment variables for svccfg.
#

#
echo "import SMF services"
SVC_FILES=`find $ROOTDIR/var/svc/manifest -name "*.xml"`
SVCCFG_DTD=$ROOTDIR/usr/share/lib/xml/dtd/service_bundle.dtd.1
SVCCFG_REPOSITORY=$ROOTDIR/etc/svc/repository.db
SVCCFG=/usr/sbin/svccfg

export SVCCFG_DTD SVCCFG_REPOSITORY SVCCFG


$SVCCFG import $ROOTDIR/var/svc/manifest/network/network-initial.xml
$SVCCFG import $ROOTDIR/var/svc/manifest/network/network-service.xml
$SVCCFG import $ROOTDIR/var/svc/manifest/milestone/sysconfig.xml
$SVCCFG import $ROOTDIR/var/svc/manifest/system/system-log.xml
$SVCCFG import $ROOTDIR/var/svc/manifest/network/inetd.xml
$SVCCFG import $ROOTDIR/var/svc/manifest/network/shell.xml
$SVCCFG import $ROOTDIR/var/svc/manifest/network/login.xml
$SVCCFG import $ROOTDIR/var/svc/manifest/network/ntp.xml
$SVCCFG import $ROOTDIR/var/svc/manifest/network/telnet.xml
$SVCCFG -s system/system-log:default setprop general/enabled=true

echo "turnoff boot-archive, rpc bind, ipfilter, manifest-import, metainit, ntp"
$SVCCFG -s system/boot-archive setprop start/exec=:true
$SVCCFG -s system/manifest-import setprop start/exec=:true

Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread can you guess?
...

> > >> Well single bit error rates may be rare in
> normal
> > >> operation hard
> > >> drives, but from a systems perspective, data can
> be
> > >> corrupted anywhere
> > >> between disk and CPU.
> > >
> > > The CERN study found that such errors (if they
> found any at all,
> > > which they couldn't really be sure of) were far
> less common than
> 
> I will note from multiple personal experiences these
> issues _do_ happen
> with netapp and emc (symm and clariion)

And Robert already noted that they've occurred in his mid-range arrays.  In 
both cases, however, you're talking about decidedly non-consumer hardware, and 
had you looked more carefully at the material to which you were responding you 
would have found that its comments were in the context of experiences with 
consumer hardware (and in particular what *quantitative* level of additional 
protection ZFS's 'special sauce' can be considered to add to its reliability).

Errors introduced by mid-range and high-end arrays don't enter into that 
discussion (though they're interesting for other reasons).

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread can you guess?
> 
> On 14-Nov-07, at 7:06 AM, can you guess? wrote:
> 
> > ...
> >
>  And how about FAULTS?
>  hw/firmware/cable/controller/ram/...
> >>>
> >>> If you had read either the CERN study or what I
> >> already said about
> >>> it, you would have realized that it included the
> >> effects of such
> >>> faults.
> >>
> >>
> >> ...and ZFS is the only prophylactic available.
> >
> > You don't *need* a prophylactic if you're not
> having sex:  the CERN  
> > study found *no* clear instances of faults that
> would occur in  
> > consumer systems and that could be attributed to
> the kinds of  
> > errors that ZFS can catch and more conventional
> file systems can't.
> 
> Hmm, that's odd, because I've certainly had such
> faults myself. (Bad  
> RAM is a very common one,

You really ought to read a post before responding to it:  the CERN study did 
encounter bad RAM (and my post mentioned that) - but ZFS usually can't do a 
damn thing about bad RAM, because errors tend to arise either before ZFS ever 
gets the data or after it has already returned and checked it (and in both 
cases, ZFS will think that everything's just fine).

 that nobody even thinks to
> check.)

Speak for yourself:  I've run memtest86+ on all our home systems, and I run it 
again whenever encountering any problem that might be RAM-related.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread can you guess?
...

> The problem it seems to me with criticizing ZFS as
> not much different
> than WAFL, is that WAFL is really a networked storage
> backend, not a
> server operating system FS. If all you're using ZFS
> for is backending
> networked storage, the "not much different" criticism
> holds a fair
> amount of water I think.

A more fundamental problem is that there are several different debates going on 
in this one thread.

The comparison with WAFL is primarily about the question of just how 'novel' 
ZFS's design is (leaving aside any questions about patent enforceability) and 
especially about just how 'unique' its reliability approaches are for 
environments that require them.  In a nutshell, while some COW approaches 
predate both WAFL and ZFS, WAFL was arguably the first to come up with the kind 
of 'write anywhere' approach that ZFS also heavily relies upon and to the best 
of my knowledge WAFL was also the first to incorporate the kind of 
in-parent-verification that has played such a prominent part in the integrity 
discussion here.

Another prominent debate in this thread revolves around the question of just 
how significant ZFS's unusual strengths are for *consumer* use.  WAFL clearly 
plays no part in that debate, because it's available only on closed, server 
systems.

 However, that highlights
> what's special about
> ZFS...it isn't limited to just that use case.

The major difference between ZFS and WAFL in this regard is that ZFS 
batch-writes-back its data to disk without first aggregating it in NVRAM (a 
subsidiary difference is that ZFS maintains a small-update log which WAFL's use 
of NVRAM makes unnecessary).  Decoupling the implementation from NVRAM makes 
ZFS usable on arbitrary rather than specialized platforms, and that without 
doubt  constitutes a significant advantage by increasing the available options 
(in both platform and price) for those installations that require the kind of 
protection (and ease of management) that both WAFL and ZFS offer and that don't 
require the level of performance that WAFL provides and ZFS often may not (the 
latter hasn't gotten much air time here, and while it can be discussed to some 
degree in the abstract a better approach would be to have some impartial 
benchmarks to look at, because the on-disk block layouts do differ 
significantly and sometimes subtly even if the underlying approaches don't).

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-14 Thread can you guess?
> Nathan Kroenert wrote:

...

 What if it did a double update: One to a
> staged area, and another 
> > immediately after that to the 'old' data blocks.
> Still always have 
> > on-disk consistency etc, at a cost of double the
> I/O's...
> 
> This is a non-starter.  Two I/Os is worse than one.

Well, that attitude may be supportable for a write-only workload, but then so 
is the position that you really don't even need *one* I/O (since no one will 
ever need to read the data and you might as well just drop it on the floor).

In the real world, data (especially database data) does usually get read after 
being written, and the entire reason the original poster raised the question 
was because sometimes it's well worth taking on some additional write overhead 
to reduce read overhead.  In such a situation, if you need to protect the 
database from partial-block updates as well as to keep it reasonably laid out 
for sequential table access, then performing the two writes described is about 
as good a solution as one can get (especially if the first of them can be 
logged - even better, logged in NVRAM - such that its overhead can be amortized 
across multiple such updates by otherwise independent processes, and even more 
especially if, as is often the case, the same data gets updated multiple times 
in sufficiently close succession that instead of 2N writes you wind up only 
needing to perform N+1 writes, the last being the only one that updates the 
data in place after the activity has cooled down).

> 
> > Of course, both of these would require non-sparse
> file creation for the 
> > DB etc, but would it be plausible?
> > 
> > For very read intensive and position sensitive
> applications, I guess 
> > this sort of capability might make a difference?
> 
> We are all anxiously awaiting data...

Then you might find it instructive to learn more about the evolution of file 
systems on Unix:

In The Beginning there was the block, and the block was small, and it was 
isolated from its brethren, and darkness was upon the face of the deep because 
any kind of sequential performance well and truly sucked.

Then (after an inexcusably lengthy period of such abject suckage lasting into 
the '80s) there came into the world FFS, and while there was still only the 
block the block was at least a bit larger, and it was at least somewhat less 
isolated from its brethren, and once in a while it actually lived right next to 
them, and while sequential performance still usually sucked at least it sucked 
somewhat less.

And then the disciples Kleiman and McVoy looked upon FFS and decided that mere 
proximity was still insufficient, and they arranged that blocks should (at 
least when convenient) be aggregated into small groups (56 KB actually not 
being all that small at the time, given the disk characteristics back then), 
and the Great Sucking Sound of Unix sequential-access performance was finally 
reduced to something at least somewhat quieter than a dull roar.

But other disciples had (finally) taken a look at commercial file systems that 
had been out in the real world for decades and that had had sequential 
performance down pretty well pat for nearly that long.  And so it came to pass 
that corporations like Veritas (VxFS), and SGI (EFS & XFS), and IBM (JFS) 
imported the concept of extents into the Unix pantheon, and the Gods of 
Throughput looked upon it, and it was good, and (at least in those systems) 
Unix sequential performance no longer sucked at all, and even non-corporate 
developers whose faith was strong nearly to the point of being blind could not 
help but see the virtues revealed there, and began incorporating extents into 
their own work, yea, even unto ext4.

And the disciple Hitz (for it was he, with a few others) took a somewhat 
different tack, and came up with a 'write anywhere file layout' but had the 
foresight to recognize that it needed some mechanism to address sequential 
performance (not to mention parity-RAID performance).  So he abandoned 
general-purpose approaches in favor of the Appliance, and gave it most 
uncommodity-like but yet virtuous NVRAM to allow many consecutive updates to be 
aggregated into not only stripes but adjacent stripes before being dumped to 
disk, and the Gods of Throughput smiled upon his efforts, and they became known 
throughout the land.

Now comes back Sun with ZFS, apparently ignorant of the last decade-plus of 
Unix file system development (let alone development in other systems dating 
back to the '60s).  Blocks, while larger (though not necessarily proportionally 
larger, due to dramatic increases in disk bandwidth), are once again often 
isolated from their brethren.  True, this makes the COW approach a lot easier 
to implement, but (leaving aside the debate about whether COW as implemented in 
ZFS is a good idea at all) there is *no question whatsoever* that it returns a 
significant degree of suckage to sequential performance - especially for data 
subj

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-14 Thread Richard Elling
can you guess? wrote:
>> can you guess? wrote:
>> 
 For very read intensive and position sensitive
 applications, I guess 
 this sort of capability might make a difference?
 
>>> No question about it.  And sequential table scans
>>>   
>> in databases 
>> 
>>> are among the most significant examples, because
>>>   
>> (unlike things 
>> 
>>> like streaming video files which just get laid down
>>>   
>> initially 
>> 
>>> and non-synchronously in a manner that at least
>>>   
>> potentially 
>> 
>>> allows ZFS to accumulate them in large, contiguous
>>>   
>> chunks - 
>> 
>>> though ISTR some discussion about just how well ZFS
>>>   
>> managed 
>> 
>>> this when it was accommodating multiple such write
>>>   
>> streams in 
>> 
>>> parallel) the tables are also subject to
>>>   
>> fine-grained, 
>> 
>>> often-random update activity.
>>>
>>> Background defragmentation can help, though it
>>>   
>> generates a 
>> 
>>> boatload of additional space overhead in any
>>>   
>> applicable snapshot.
>>
>> The reason that this is hard to characterize is that
>> there are
>> really two very different configurations used to
>> address different
>> performance requirements: cheap and fast.  It seems
>> that when most
>> people first consider this problem, they do so from
>> the cheap
>> perspective: single disk view.  Anyone who strives
>> for database
>> performance will choose the fast perspective:
>> stripes.
>> 
>
> And anyone who *really* understands the situation will do both.
>   

I'm not sure I follow.  Many people who do high performance
databases use hardware RAID arrays which often do not
expose single disks.

>   Note: data
>   
>> redundancy isn't really an issue for this analysis,
>> but consider it
>> done in real life.  When you have a striped storage
>> device under a
>> file system, then the database or file system's view
>> of contiguous
>> data is not contiguous on the media.
>> 
>
> The best solution is to make the data piece-wise contiguous on the media at 
> the appropriate granularity - which is largely determined by disk access 
> characteristics (the following assumes that the database table is large 
> enough to be spread across a lot of disks at moderately coarse granularity, 
> since otherwise it's often small enough to cache in the generous amounts of 
> RAM that are inexpensively available today).
>
> A single chunk on an (S)ATA disk today (the analysis is similar for 
> high-performance SCSI/FC/SAS disks) needn't exceed about 4 MB in size to 
> yield over 80% of the disk's maximum possible (fully-contiguous layout) 
> sequential streaming performance (after the overhead of an 'average' - 1/3 
> stroke - initial seek and partial rotation are figured in:  the latter could 
> be avoided by using a chunk size that's an integral multiple of the track 
> size, but on today's zoned disks that's a bit awkward).  A 1 MB chunk yields 
> around 50% of the maximum streaming performance.  ZFS's maximum 128 KB 'chunk 
> size' if effectively used as the disk chunk size as you seem to be suggesting 
> yields only about 15% of the disk's maximum streaming performance (leaving 
> aside an additional degradation to a small fraction of even that should you 
> use RAID-Z).  And if you match the ZFS block size to a 16 KB database block 
> size and use that as the effective unit of distribution across the set of 
> disks, you'll !
 obt
>  ain a mighty 2% of the potential streaming performance (again, we'll be 
> charitable and ignore the further degradation if RAID-Z is used).
>
>   

You do not seem to be considering the track cache, which for
modern disks is 16-32 MBytes.  If those disks are in a RAID array,
then there is often larger read caches as well.  Expecting a seek and
read for each iop is a bad assumption.

> Now, if your system is doing nothing else but sequentially scanning this one 
> database table, this may not be so bad:  you get truly awful disk utilization 
> (2% of its potential in the last case, ignoring RAID-Z), but you can still 
> read ahead through the entire disk set and obtain decent sequential scanning 
> performance by reading from all the disks in parallel.  But if your database 
> table scan is only one small part of a workload which is (perhaps the worst 
> case) performing many other such scans in parallel, your overall system 
> throughput will be only around 4% of what it could be had you used 1 MB 
> chunks (and the individual scan performances will also suck commensurately, 
> of course).
>
> Using 1 MB chunks still spreads out your database admirably for parallel 
> random-access throughput:  even if the table is only 1 GB in size (eminently 
> cachable in RAM, should that be preferable), that'll spread it out across 
> 1,000 disks (2,000, if you mirror it and load-balance to spread out the 
> accesses), and for much smaller database tables if they're accesse

Re: [zfs-discuss] ZFS + DB + default blocksize

2007-11-14 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Louwtjie Burger wrote:
> On 11/8/07, Richard Elling <[EMAIL PROTECTED]> wrote:
>> Potentially, depending on the write part of the workload, the system may
>> read
>> 128 kBytes to get a 16 kByte block.  This is not efficient and may be
>> noticeable
>> as a performance degradation.
> 
> Hi Richard.
> 
> The amount of time it takes to position the drive to get to the start
> of the 16K block takes longer than the time it takes to read the extra
> 112 KB ... depending where on the platter this is one could calculate
> it.

Worse yet, if your zfs blocksize is 128KB and your database worksize is
16Kbytes, ZFS would load 128Kbytes, update 16 kbytes inside there and
write out 128 kbytes to the disk.

If both blocksizes are equal, you don't need the read part. That is a
huge win.

- --
Jesus Cea Avion _/_/  _/_/_/_/_/_/
[EMAIL PROTECTED] http://www.argo.es/~jcea/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:[EMAIL PROTECTED] _/_/_/_/  _/_/_/_/_/
   _/_/  _/_/_/_/  _/_/  _/_/
"Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBRztjCplgi5GaxT1NAQIxHAP/VH142N+TAfFpZweli6FofC2r0lreB9zx
yvhqZa6i4UHpMKHHODIlLL76iMc10rtT0o0of/Tlm3Ohz/ZDjZ4Emh13zLx4+EBk
JizrFKSBfnEa3KVJ4j2rTRRDsqCelw9YTmfUnd+eUk3hw2GNwpocVDK3QVkS1xWM
vuUdxUAdnZc=
=UlDy
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Missing zpool devices, what are the options

2007-11-14 Thread David Bustos
Quoth Mark Ashley on Mon, Nov 12, 2007 at 11:35:57AM +1100:
> Is it possible to tell ZFS to forget those SE6140 LUNs ever belonged to the
> zpool? I know that ZFS will have probably put some user data on them, but if
> there is a possibility of recovering any of those zvols on the zpool 
> it'd really help a lot, to put it mildly. My understanding is all the
> metadata will be spread around and polluted by now, even after a few
> days of the SE6140 LUNs being linked, but I thought I'd ask.

No.  I believe this is 4852783, "reduce pool capacity", which hasn't
been implemented yet.  (I don't know whether it's being worked on.)
I think your best bet is to copy off any data you can get to, and
recreate the pool.


David
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread Toby Thain

On 14-Nov-07, at 7:06 AM, can you guess? wrote:

> ...
>
 And how about FAULTS?
 hw/firmware/cable/controller/ram/...
>>>
>>> If you had read either the CERN study or what I
>> already said about
>>> it, you would have realized that it included the
>> effects of such
>>> faults.
>>
>>
>> ...and ZFS is the only prophylactic available.
>
> You don't *need* a prophylactic if you're not having sex:  the CERN  
> study found *no* clear instances of faults that would occur in  
> consumer systems and that could be attributed to the kinds of  
> errors that ZFS can catch and more conventional file systems can't.

Hmm, that's odd, because I've certainly had such faults myself. (Bad  
RAM is a very common one, that nobody even thinks to check.)

--Toby

>   It found faults in the interaction of its add-on RAID controller  
> (not a normal 'consumer' component) with its WD disks, it found  
> single-bit errors that appeared to correlate with ECC RAM errors  
> (i.e., likely occurred in RAM rather than at any point where ZFS  
> would be involved), it found block-sized errors that appeared to  
> correlate with misplaced virtual memory allocation (again, outside  
> ZFS's sphere of influence).
>
>>
>>
>>>
>>> ...
>>>
>  but I had a box that was randomly
>> corrupting blocks during
>> DMA.  The errors showed up when doing a ZFS
>> scrub
 and
>> I caught the
>> problem in time.
>
> Yup - that's exactly the kind of error that ZFS
>> and
 WAFL do a
> perhaps uniquely good job of catching.

 WAFL can't catch all: It's distantly isolated from
 the CPU end.
>>>
>>> WAFL will catch everything that ZFS catches,
>> including the kind of
>>> DMA error described above:  it contains validating
>> information
>>> outside the data blocks just as ZFS does.
>>
>> Explain how it can do that, when it is isolated from
>> the application
>> by several layers including the network?
>
> Darrell covered one aspect of this (i.e., that ZFS couldn't either  
> if it were being used in a server), but there's another as well:   
> as long as the NFS messages between client RAM and server RAM are  
> checksummed in RAM on both ends, then that extends the checking all  
> the way to client RAM (the same place where local ZFS checks end)  
> save for any problems occurring *in* RAM at one end or the other  
> (and ZFS can't deal with in-RAM problems either:  all it can do is  
> protect the data until it gets to RAM).
>
> - bill
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread Toby Thain

On 14-Nov-07, at 12:43 AM, Jason J. W. Williams wrote:

> Hi Darren,
>
>> Ah, your "CPU end" was referring to the NFS client cpu, not the  
>> storage
>> device CPU.  That wasn't clear to me.  The same limitations would  
>> apply
>> to ZFS (or any other filesystem) when running in support of an NFS
>> server.
>>
>> I thought you were trying to describe a qualitative difference  
>> between
>> ZFS and WAFL in terms of data checksumming in the on-disk layout.
>
> Eh...NetApp can just open WAFL to neuter the argument... ;-) Or I
> suppose you could just run ZFS on top of an iSCSI or FC mount from the
> NetApp.
>
> The problem it seems to me with criticizing ZFS as not much different
> than WAFL, is that WAFL is really a networked storage backend, not a
> server operating system FS. If all you're using ZFS for is backending
> networked storage, the "not much different" criticism holds a fair
> amount of water I think. However, that highlights what's special about
> ZFS...it isn't limited to just that use case. Its the first server OS
> FS (to my knowledge) to provide all those features in one place, and
> that's what makes it revolutionary. Because you can truly use its
> features in any application with any storage. Its on that basis I
> think that placing ZFS and WAFL on equal footing is not a strong
> argument.

That was my thinking, and better put than I could, thankyou.

--Toby

>
> Best Regards,
> Jason
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread Toby Thain

On 13-Nov-07, at 9:18 PM, A Darren Dunham wrote:

> On Tue, Nov 13, 2007 at 07:33:20PM -0200, Toby Thain wrote:
> Yup - that's exactly the kind of error that ZFS and
 WAFL do a
> perhaps uniquely good job of catching.

 WAFL can't catch all: It's distantly isolated from
 the CPU end.
>>>
>>> WAFL will catch everything that ZFS catches, including the kind of
>>> DMA error described above:  it contains validating information
>>> outside the data blocks just as ZFS does.
>>
>> Explain how it can do that, when it is isolated from the application
>> by several layers including the network?
>
> Ah, your "CPU end" was referring to the NFS client cpu, not the  
> storage
> device CPU.  That wasn't clear to me.  The same limitations would  
> apply
> to ZFS (or any other filesystem) when running in support of an NFS
> server.
>
> I thought you were trying to describe a qualitative difference between
> ZFS and WAFL in terms of data checksumming in the on-disk layout.

Yes, I was comparing apples and oranges, as our mysterious friend  
will be sure to point out. But I still don't think WAFL and ZFS are  
interchangeable, because if you *really* care about integrity you  
won't choose an isolated storage subsystem - and does anyone run WAFL  
on the application host?

--Toby

>
> -- 
> Darren Dunham
> [EMAIL PROTECTED]
> Senior Technical Consultant TAOShttp:// 
> www.taos.com/
> Got some Dr Pepper?   San Francisco, CA bay  
> area
>  < This line left intentionally blank to confuse you. >
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to create ZFS pool ?

2007-11-14 Thread Mike Dotson
On Wed, 2007-11-14 at 21:23 +, A Darren Dunham wrote:
> On Wed, Nov 14, 2007 at 09:40:59AM -0800, Boris Derzhavets wrote:
> > I was able to create second Solaris partition by running 
> > 
> > #fdisk /dev/rdsk/c1t0d0p0
> 
> I'm afraid that won't do you much good.
> 
> Solaris only works with one "Solaris" partition at a time (on any one
> disk).  If you have free space that you want to play with, it should be
> within the existing partition (or be on another disk).
> 
> > Is it posible to create zfs pool with third partition ?
> 
> I doubt it, but I think it more of a general Solaris limitation than
> anything to do with ZFS specifically.

You can't use another Solaris partition but you could use a different
partition ID:

 Total disk size is 9729 cylinders
 Cylinder size is 16065 (512 byte) blocks

   Cylinders
  Partition   StatusType  Start   End   Length%
  =   ==  =   ===   ==   ===
  1 IFS: NTFS 0  10431044 11
  2 Linux native   1044  23481305 13
  3   ActiveSolaris2   2349  49592611 27
  4 Other OS   4960  97284769 49


SELECT ONE OF THE FOLLOWING:
   1. Create a partition
   2. Specify the active partition
   3. Delete a partition
   4. Change between Solaris and Solaris2 Partition IDs
   5. Exit (update disk configuration and exit)
   6. Cancel (exit without updating disk configuration)

Notice partition 4 is "Other OS" which is where I have my zfs pool:

helios(2):> zpool status
  pool: lpool
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool
can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
lpool   ONLINE   0 0 0
  c0d0p4ONLINE   0 0 0

errors: No known data errors


So to create the pool in my case would be: zpool create lpool c0d0p4



-- 
Mike Dotson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to create ZFS pool ?

2007-11-14 Thread A Darren Dunham
On Wed, Nov 14, 2007 at 09:40:59AM -0800, Boris Derzhavets wrote:
> I was able to create second Solaris partition by running 
> 
> #fdisk /dev/rdsk/c1t0d0p0

I'm afraid that won't do you much good.

Solaris only works with one "Solaris" partition at a time (on any one
disk).  If you have free space that you want to play with, it should be
within the existing partition (or be on another disk).

> Is it posible to create zfs pool with third partition ?

I doubt it, but I think it more of a general Solaris limitation than
anything to do with ZFS specifically.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
 < This line left intentionally blank to confuse you. >
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem Benchmark

2007-11-14 Thread Peter Tribble
On 11/14/07, Gary Wright <[EMAIL PROTECTED]> wrote:
>
> Hope you don't mind me asking but we are planning to use a CX3-20 Dell/EMC 
> SAN connected to a T5220 server (Solaris 10). Can you tell me
> if you were forced to use PowerPath or have you used MPXIO/Traffic Manager. 
> Did you use LPe11000-E (Single Channel) or LPe11002-E (dual channel) HBA's?
>
> Did you encounter any problems with configuring this.

My experience in this area is that powerpath doesn't get along with zfs
(I couldn't import the pool); using MPxIO worked fine.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to create ZFS pool ?

2007-11-14 Thread Tim Spriggs
Hi Boris,

When you create a Solaris2 Partition under x86, Solaris "sees" the 
partition as a disk that you can cut into slices. You can find a list of 
disks available via the "format" command.

A slice is much like a partition but there is a difference; that's most 
or all you really need to know to use them. Once you have found the new 
disk you can simply:

zpool create pool c1t0d1

Let me know if you still find trouble.

Thanks,
-Tim

Boris Derzhavets wrote:
> I was able to create second Solaris partition by running 
>
> #fdisk /dev/rdsk/c1t0d0p0
>
> First was NTFS (40GB)
> Second was SNV76 installation (40 GB)
> Third has been created by me.
> Rebooted system.Double checked 
> by fdisk that partition exists
> My intent is to run:-
> # zpool create pool c1t0d0
> Cannot find out device name in Solaris system.
> man fdisk,man format appears not enough for me.
> I am missing something
>
> Sorry, for stupid questions.
> What the device has been created by fdisk ?
> Is it posible to create zfs pool with third partition ?
>
> Linux guy (fdisk /dev/sda)
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to create ZFS pool ?

2007-11-14 Thread Boris Derzhavets
I was able to create second Solaris partition by running 

#fdisk /dev/rdsk/c1t0d0p0

First was NTFS (40GB)
Second was SNV76 installation (40 GB)
Third has been created by me.
Rebooted system.Double checked 
by fdisk that partition exists
My intent is to run:-
# zpool create pool c1t0d0
Cannot find out device name in Solaris system.
man fdisk,man format appears not enough for me.
I am missing something

Sorry, for stupid questions.
What the device has been created by fdisk ?
Is it posible to create zfs pool with third partition ?

Linux guy (fdisk /dev/sda)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-14 Thread Richard Elling
can you guess? wrote:
>> For very read intensive and position sensitive
>> applications, I guess 
>> this sort of capability might make a difference?
> 
> No question about it.  And sequential table scans in databases 
> are among the most significant examples, because (unlike things 
> like streaming video files which just get laid down initially 
> and non-synchronously in a manner that at least potentially 
> allows ZFS to accumulate them in large, contiguous chunks - 
> though ISTR some discussion about just how well ZFS managed 
> this when it was accommodating multiple such write streams in 
> parallel) the tables are also subject to fine-grained, 
> often-random update activity.
> 
> Background defragmentation can help, though it generates a 
> boatload of additional space overhead in any applicable snapshot.

The reason that this is hard to characterize is that there are
really two very different configurations used to address different
performance requirements: cheap and fast.  It seems that when most
people first consider this problem, they do so from the cheap
perspective: single disk view.  Anyone who strives for database
performance will choose the fast perspective: stripes.  Note: data
redundancy isn't really an issue for this analysis, but consider it
done in real life.  When you have a striped storage device under a
file system, then the database or file system's view of contiguous
data is not contiguous on the media. There are many different ways
to place the data on the media and we would typically strive for a
diverse stochastic spread.  Hmm... one could theorize that COW will
also result in a diverse stochastic spread.  The complexity of the
characterization is then caused by the large number of variables
which the systems use to spread the data (interlace size, block size,
prefetch, caches, cache policies, etc) and the feasibility of
understanding the interdependent relationships these will have on
performance.

Real data would be greatly appreciated.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread Wade . Stuart

>
> On 9-Nov-07, at 2:45 AM, can you guess? wrote:
>
> >>> Au contraire:  I estimate its worth quite
> >> accurately from the undetected error rates reported
> >> in the CERN "Data Integrity" paper published last
> >> April (first hit if you Google 'cern "data
> >> integrity"').
> >>>
>  While I have yet to see any checksum error
> >> reported
>  by ZFS on
>  Symmetrix arrays or FC/SAS arrays with some other
>  "cheap" HW I've seen
>  many of them
> >>>
> >>> While one can never properly diagnose anecdotal
> >> issues off the cuff in a Web forum, given CERN's
> >> experience you should probably check your
> >> configuration very thoroughly for things like
> >> marginal connections:  unless you're dealing with a
> >> far larger data set than CERN was, you shouldn't have
> >> seen 'many' checksum errors.
> >>
> >> Well single bit error rates may be rare in normal
> >> operation hard
> >> drives, but from a systems perspective, data can be
> >> corrupted anywhere
> >> between disk and CPU.
> >
> > The CERN study found that such errors (if they found any at all,
> > which they couldn't really be sure of) were far less common than

I will note from multiple personal experiences these issues _do_ happen
with netapp and emc (symm and clariion) -- I will also say that many times
you do not read about them because you will find that when they do happen
to you one of the first people to show up on your site will be their legal
team pushing paper and sharking for signatures.


-Wade

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem Benchmark

2007-11-14 Thread Gary Wright
Hi Cesare,

Hope you don't mind me asking but we are planning to use a CX3-20 Dell/EMC SAN 
connected to a T5220 server (Solaris 10). Can you tell me
if you were forced to use PowerPath or have you used MPXIO/Traffic Manager. Did 
you use LPe11000-E (Single Channel) or LPe11002-E (dual channel) HBA's?  

Did you encounter any problems with configuring this.

Any comments greatly appreciated.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is ZFS stable in OpenSolaris?

2007-11-14 Thread hex.cookie
In production environment, which platform should we use? Solaris 10 U4 or 
OpenSolaris 70+?  How should we estimate a stable edition for production? Or 
OpenSolaris is stable in some build?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread can you guess?
> can you guess? wrote:
> 
> >> at the moment only ZFS can give this assurance,
> plus
> >> the ability to
> >> self correct detected
> >> errors.
> >> 
> >
> > You clearly aren't very familiar with WAFL (which
> can do the same).
> >
> >   

...

 so far as I can tell it's quite
> irrelevant to me at home; I 
> can't afford it.

Neither can I - but the poster above was (however irrelevantly) talking about 
ZFS's supposedly unique features for *businesses*, so I answered in that 
context.

(By the way, something has gone West with my email and I'm temporarily unable 
to send the response I wrote to your message night before last.  If you meant 
to copy it here as well, just do so and I'll respond to it here.)

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread David Dyer-Bennet
can you guess? wrote:
> ...
>  I
>   
>> was running the card in RAID0 and getting random
>> corrupted bytes on
>> reads that went away when I switched to JBOD.
>> 
>
> Then it kind of sounds like a card problem rather than a cable problem.
>
> Perhaps there's a very basic definition issue here:  when I use the term 
> 'consumer', I'm referring to the people who buy a computer and never open up 
> the case, not to people who fool around with RAID cards.  I'm referring to 
> people who would likely say "What?" if you referred to Linux.  I'm referring 
> to people who would be extremely unlikely to be found participating in this 
> forum.
>
> In other words, to the overwhelming majority of PC users, who don't want to 
> hear anything that suggests that they might have to become more intimately 
> involved with their computer in order to make it better (let alone 'better' 
> in the relatively marginal and fairly abstruse ways that ZFS would).
>   

Statistically there are a lot of them.  But I've known lots of 
early-adopters with no professional computer background (one of them has 
*since then* done some tech-writing work) who built machines from parts, 
replaced motherboards, upgraded the processors in "non-upgradeable" 
MACs, ran OS/2, even converted themselves to Linux, on their home 
systems.  These are consumer users too. 

> I'd include most Mac users as well, except that they've just suffered a major 
> disruption to their world-view by being told that moving to the 
> previously-despised Intel platform constitutes an *upgrade* - so if you can 
> give them any excuse to think that ZFS is superior (and not available on 
> Windows) they'll likely grab for it like desperate voyagers on the Titanic 
> grabbed for life savers (hey, Steve's no dummy).
>   

Long-time MAC users must be getting used to having their entire world 
disrupted and having to re-buy all their software.  This is at least the 
second complete flag-day (no forward or backwards compatibility) change 
they've been through.

-- 
David Dyer-Bennet, [EMAIL PROTECTED]; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread David Dyer-Bennet
can you guess? wrote:

>> at the moment only ZFS can give this assurance, plus
>> the ability to
>> self correct detected
>> errors.
>> 
>
> You clearly aren't very familiar with WAFL (which can do the same).
>
>   

That's quite possibly a factor.  I'm pretty thoroghly unfamiliar with 
WAFL myself, though I think I've probably used it via NFS in a work 
environment or two.

In any case, so far as I can tell it's quite irrelevant to me at home; I 
can't afford it.

-- 
David Dyer-Bennet, [EMAIL PROTECTED]; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] iSCSI on ZFS with Linux initiator

2007-11-14 Thread Mertol Ozyoney
Hi;

 

Do anyone have experiance on iSCSI target volumes on ZFS accessed by linux
clients? (Red hat , suse ?)

 

regards

 

 


  http://www.sun.com/emrkt/sigs/6g_top.gif

Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email   [EMAIL PROTECTED]

 

 

<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread can you guess?
...
 I
> was running the card in RAID0 and getting random
> corrupted bytes on
> reads that went away when I switched to JBOD.

Then it kind of sounds like a card problem rather than a cable problem.

Perhaps there's a very basic definition issue here:  when I use the term 
'consumer', I'm referring to the people who buy a computer and never open up 
the case, not to people who fool around with RAID cards.  I'm referring to 
people who would likely say "What?" if you referred to Linux.  I'm referring to 
people who would be extremely unlikely to be found participating in this forum.

In other words, to the overwhelming majority of PC users, who don't want to 
hear anything that suggests that they might have to become more intimately 
involved with their computer in order to make it better (let alone 'better' in 
the relatively marginal and fairly abstruse ways that ZFS would).

I'd include most Mac users as well, except that they've just suffered a major 
disruption to their world-view by being told that moving to the 
previously-despised Intel platform constitutes an *upgrade* - so if you can 
give them any excuse to think that ZFS is superior (and not available on 
Windows) they'll likely grab for it like desperate voyagers on the Titanic 
grabbed for life savers (hey, Steve's no dummy).

People like you and me with somewhat more knowledge about computers are like 
airline employees who tend to choose their seats with an eye toward crash 
survivability:  no, this probably won't mean they'll survive a crash, but it 
makes them feel better to be doing the little that they can.  And they just 
accept the fact that the rest of the world would prefer not to think that they 
had to worry about crashes at all (if they thought otherwise, a lot more planes 
would be built with their seats facing backward).

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread can you guess?
> some business do not accept any kind of risk

Businesses *always* accept risk:  they just try to minimize it within the 
constraints of being cost-effective.  Which is a good thing for ZFS, because it 
can't eliminate risk either, just help to minimize it cost-effectively.

However, the subject here is not business use but 'consumer' use.

...

> at the moment only ZFS can give this assurance, plus
> the ability to
> self correct detected
> errors.

You clearly aren't very familiar with WAFL (which can do the same).

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread can you guess?
> Darrell

My apologies, Darren.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-14 Thread can you guess?
...

> >> And how about FAULTS?
> >> hw/firmware/cable/controller/ram/...
> >
> > If you had read either the CERN study or what I
> already said about  
> > it, you would have realized that it included the
> effects of such  
> > faults.
> 
> 
> ...and ZFS is the only prophylactic available.

You don't *need* a prophylactic if you're not having sex:  the CERN study found 
*no* clear instances of faults that would occur in consumer systems and that 
could be attributed to the kinds of errors that ZFS can catch and more 
conventional file systems can't.  It found faults in the interaction of its 
add-on RAID controller (not a normal 'consumer' component) with its WD disks, 
it found single-bit errors that appeared to correlate with ECC RAM errors 
(i.e., likely occurred in RAM rather than at any point where ZFS would be 
involved), it found block-sized errors that appeared to correlate with 
misplaced virtual memory allocation (again, outside ZFS's sphere of influence).

> 
> 
> >
> > ...
> >
> >>>  but I had a box that was randomly
>  corrupting blocks during
>  DMA.  The errors showed up when doing a ZFS
> scrub
> >> and
>  I caught the
>  problem in time.
> >>>
> >>> Yup - that's exactly the kind of error that ZFS
> and
> >> WAFL do a
> >>> perhaps uniquely good job of catching.
> >>
> >> WAFL can't catch all: It's distantly isolated from
> >> the CPU end.
> >
> > WAFL will catch everything that ZFS catches,
> including the kind of  
> > DMA error described above:  it contains validating
> information  
> > outside the data blocks just as ZFS does.
> 
> Explain how it can do that, when it is isolated from
> the application  
> by several layers including the network?

Darrell covered one aspect of this (i.e., that ZFS couldn't either if it were 
being used in a server), but there's another as well:  as long as the NFS 
messages between client RAM and server RAM are checksummed in RAM on both ends, 
then that extends the checking all the way to client RAM (the same place where 
local ZFS checks end) save for any problems occurring *in* RAM at one end or 
the other (and ZFS can't deal with in-RAM problems either:  all it can do is 
protect the data until it gets to RAM).

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-14 Thread can you guess?
> This question triggered some silly questions in my
> mind:

Actually, they're not silly at all.

> 
> Lots of folks are determined that the whole COW to
> different locations 
> are a Bad Thing(tm), and in some cases, I guess it
> might actually be...
> 
> What if ZFS had a pool / filesystem property that
> caused zfs to do a 
> journaled, but non-COW update so the data's relative
> location for 
> databases is always the same?

That's just what a conventional file system (no need even for a journal, when 
you're updating in place) does when it's not guaranteeing write atomicity (you 
address the latter below).

> 
> Or - What if it did a double update: One to a staged
> area, and another 
> immediately after that to the 'old' data blocks.
> Still always have 
> on-disk consistency etc, at a cost of double the
> I/O's...

It only requires an extra disk access if the new data is too large to dump 
right into the journal itself (which guarantees that the subsequent in-place 
update can complete).  Whether the new data is dumped into the log or into a 
temporary location the pointer to which is logged, the subsequent in-place 
update can be deferred until it's convenient (e.g., until after any additional 
updates to the same data have also been accumulated, activity has cooled off, 
and the modified blocks are getting ready to be evicted from the system cache - 
and, optionally, until the target disks are idle or have their heads positioned 
conveniently near the target location).

ZFS's small-synchronous-write log can do something similar as long as the 
writes aren't too large to place in it.  However, data that's only persisted in 
the journal isn't accessible via the normal snapshot mechanisms (well, if an 
entire file block was dumped into the journal I guess it could be, at the cost 
of some additional complexity in journal space reuse), so I'm guessing that ZFS 
writes back any dirty data that's in the small-update journal whenever a 
snapshot is created.

And if you start actually updating in place as described above, then you can't 
use ZFS-style snapshotting at all:  instead of capturing the current state as 
the snapshot with the knowledge that any subsequent updates will not disturb 
it, you have to capture the old state that you're about to over-write and stuff 
it somewhere else - and then figure out how to maintain appropriate access to 
it while the rest of the system moves on.

Snapshots make life a lot more complex for file systems than it used to be, and 
COW techniques make snapshotting easy at the expense of normal run-time 
performance - not just because they make update-in-place infeasible for 
preserving on-disk contiguity but because of the significant increase in disk 
bandwidth (and snapshot storage space) required to write back changes all the 
way up to whatever root structure is applicable:  I suspect that ZFS does this 
on every synchronous update save for those that it can leave temporarily in its 
small-update journal, and it *has* to do it whenever a snapshot is created.

> 
> Of course, both of these would require non-sparse
> file creation for the 
> DB etc, but would it be plausible?

Update-in-place files can still be sparse:  it's only data that already exists 
that must be present (and updated in place to preserve sequential access 
performance to it).

> 
> For very read intensive and position sensitive
> applications, I guess 
> this sort of capability might make a difference?

No question about it.  And sequential table scans in databases are among the 
most significant examples, because (unlike things like streaming video files 
which just get laid down initially and non-synchronously in a manner that at 
least potentially allows ZFS to accumulate them in large, contiguous chunks - 
though ISTR some discussion about just how well ZFS managed this when it was 
accommodating multiple such write streams in parallel) the tables are also 
subject to fine-grained, often-random update activity.

Background defragmentation can help, though it generates a boatload of 
additional space overhead in any applicable snapshot.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] space_map.c 'ss == NULL' panic strikes back.

2007-11-14 Thread Pawel Jakub Dawidek
Hi.

Someone currently reported a 'ss == NULL' panic in
space_map.c/space_map_add() on FreeBSD's version of ZFS.

I found that this problem was previously reported on Solaris and is
already fixed. I verified it and FreeBSD's version have this fix in
place...


http://src.opensolaris.org/source/diff/onnv/onnv-gate/usr/src/uts/common/fs/zfs/space_map.c?r2=3761&r1=3713

I'd really like to help this guy get his data back, so please point me
into right direction. We have a crash dump of the panic, BTW.

It happened after a spontaneous reboot. Now, the system panics on
'zpool import' immediately.

He already tried two things:

1. Importing the pool with 'zpool import -o ro backup'. No luck, it
   crashes.

2. Importing the pool without mounting file systems (I sent him a patch
   to zpool, to not mount file systems automatically on pool import).
   I hoped that maybe only one or more file systems are corrupted, but
   no, it panics immediately as well.

It's the biggest storage machine in there, so there is no way to backup
raw disks before starting more experiments, that's why I'm writting
here. I've two ideas:

1. Because it happend on system crash or something, we can expect that
   this is caused by the last change. If so, we could try corrupting
   most recent uberblock, so ZFS will pick up previous uberblock.

2. Instead of pancing in space_map_add(), we could try to
   space_map_remove() the offensive entry, eg:

-   VERIFY(ss == NULL);
+   if (ss != NULL) {
+   space_map_remove(sm, ss->ss_start, ss->ss_end);
+   goto again;
+   }

Both of those ideas can make things worse, so I want to know what damage
can be done using those method, or even better, what else (safer) we can
try?

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp6Xm9y44G1x.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss