I've been noticing regular writing activity to my data pool while the system's
relatively idle, just a little read IO. Turns out the system's writing up to
20MB of data to the pool every 15-30 seconds. Using iotop from the DTrace
Toolkit, apparently the process responsible is sched. What's
I've been noticing regular writing activity to my data pool while the
system's relatively idle, just a little read IO. Turns out the
system's writing up to 20MB of data to the pool every 15-30 seconds.
Using iotop from the DTrace Toolkit, apparently the process
responsible is sched. What's going
Did you disable 'atime' updates for your filesystem? Otherwise the file
access times need to be periodically updated and this would happen may
every 15-30 seconds.
Not disabled. But 20MB worth of metadata updates while the system
practically does nothing? Only real things happening is a video
Second question: would it make much difference to have 12 or 22 ZFS
filesystems? What's the memory footprint of a ZFS filesystem
I remember a figure of 64KB kernel memory per file system.
-mg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
if you add a raidz group to a group of 3 mirrors, the entire pool slows
down to the speed of the raidz.
That's not true. Blocks are being randomly spread across all vdevs.
Unless all requests keep pulling blocks from the RAID-Z, the speed is a
mean of the performance of all vdevs.
-mg
As some Sun folks pointed out
1) No redundancy at the power or networking side
2) Getting 2TB drives in a x4540 would make the numbers closer
3) Performance isn't going to be that great with their design but...they
might not need it.
4) Silicon Image chipsets. Their SATA controller chips used
An introduction to btrfs, from somebody who used to work on ZFS:
http://www.osnews.com/story/21920/A_Short_History_of_btrfs
*very* interesting article.. Not sure why James didn't directly link to
it, but courteous of Valerie Aurora (formerly Henson)
http://lwn.net/Articles/342892/
I'm trying
This is my first ZFS pool. I'm using an X4500 with 48 TB drives. Solaris is
5/09.
After the create zfs list shows 40.8T but after creating 4
filesystems/mountpoints the available drops 8.8TB to 32.1TB. What happened to
the 8.8TB. Is this much overhead normal?
IIRC zpool list includes the
To All : The ECC discussion was very interesting as I had never
considered it that way! I willl be buying ECC memory for my home
machine!!
You have to make sure your mainboard, chipset and/or CPU support it,
otherwise any ECC modules will just work like regular modules.
The mainboard needs
Because.
90+% of the normal desktop users will run a non-redundant pool, and
expect their filesystems to not add operational failures, but come
back after a yanked power cord without fail.
OpenSolaris desktop users are surely less than 0.5% of the desktop
population. Are the 90+% of the normal
The good news is that ZFS is getting popular enough on consumer-grade
hardware. The bad news is that said hardware has a different set of
failure modes, so it takes a bit of work to become resilient to them.
This is pretty high on my short list.
One thing I'd like to see is an _easy_ option
Does anyone know specifically if b105 has ZFS encryption?
IIRC it has been pushed back to b109.
-mg
signature.asc
Description: OpenPGP digital signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
with my general question about ZFS and RAIDZ I want the following to know:
Must all harddisks for the storage pool have the same capacity or is it
possible to use harddisks with different capacities?
Lowest common denominator applies here. Creating a RAIDZ from a 100GB,
200GB and 300GB disk
with my general question about ZFS and RAIDZ I want the following to
know:
Must all harddisks for the storage pool have the same capacity or is
it possible to use harddisks with different capacities?
Lowest common denominator applies here. Creating a RAIDZ from a 100GB,
200GB and 300GB disk
I expect it will go SO SLOW, that some function somewhere is eventually
going to fail/timeout. That system is barely usable WITHOUT
compression. I hope at the very least you're disabling every single
unnecessary service before doing any testing, especially the GUI.
ZFS uses ram, and
Rob Logan wrote:
ECC?
$60 unbuffered 4GB 800MHz DDR2 ECC CL5 DIMM (Kit Of 2)
http://www.provantage.com/kingston-technology-kvr800d2e5k2-4g~7KIN90H4.htm
Geez, I have to move to the US for cheap hardware. I've paid 120€ for
exactly that 4GB ECC kit (well, I bought two of these, so 240€) in
How can I diagnose why a resilver appears to be hanging at a certain
percentage, seemingly doing nothing for quite a while, even though the
HDD LED is lit up permanently (no apparent head seeking)?
The drives in the pool are WD Raid Editions, thus have TLER and should
time out on errors in just
How can I diagnose why a resilver appears to be hanging at a certain
percentage, seemingly doing nothing for quite a while, even though the
HDD LED is lit up permanently (no apparent head seeking)?
The drives in the pool are WD Raid Editions, thus have TLER and should
time out on errors in
For files smaller than the default (128K) or the user-defined value, the
recordsize will be the smallest power of two between 512 bytes and the
appropriate upper limit. For anything above the value, it's the defined
recordsize for every block in the file. Variable recordsize is only for
single
WOW! This is quite a departure from what we've been
told for the past 2 years...
This must be misinformation.
The reason there's no project (yet) is very likely because pool shrinking
depends strictly on the availability of bp_rewrite functionality, which is
still in development.
The last
Jesus, is this argument still going on.
The only Linux-lookaliking going on here is the GNU toolchain in
/usr/gnu/bin spearheading the PATH variable (set in .bashrc) and bash
currently as default shell for regular users. Nothing else. Change the
user's shell, you're back to Solaris.
-mg
This,
Latest BeleniX OpenSolaris uses the Caiman installer so it may be
worth installing it just to see what it is like. I installed it under
VirtualBox yesterday. Installing using whole disk did not work with
VirtualBox but the suggested default partitioning did work.
OpenSolaris 2008.05
I suppose an error correcting code like 256bit Hamming or Reed-Solomon
can't substitute as reliable checksum on the level of default
Fletcher2/4? If it can, it could be offered as alternative algorithm
where necessary and let ZFS react accordingly, or not?
Regards,
-mg
On 12-août-08, at
Diskspace may be lost on redundacy, but there's still two or more
devices in the mirror. Read requests can be spread across these.
--
Via iPhone 3G
On 11-août-08, at 11:07, Martin Svensson [EMAIL PROTECTED]
m wrote:
I read this (http://blogs.sun.com/roch/entry/when_to_and_not_to)
blog
Possibly metadata. Since that's however redundant due to ditto blocks
(2 or 3 copies depending on importance), it was repaired during the
scrub.
--
Via iPhone 3G
On 05-août-08, at 21:11, soren [EMAIL PROTECTED] wrote:
soren wrote:
ZFS has detected that my root filesystem has a
small
Rahul wrote:
hi
can you give some disadvantages of the ZFS file system??
plzz its urgent...
help me.
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
--
Via iPhone 3G
On 04-août-08, at 19:46, Lori Alt [EMAIL PROTECTED] wrote:
I'll try to help, but I'm confused by a few things. First, when
you say that you upgraded from OpenSolaris 2008.05 to snv_94,
what do you mean? Because I'm not sure how one upgrades
an IPS-based release to the
The first attempt at this went well...
Anyway, he meant updating to the latest Indiana repo, which is based
on snv_94.
Regards,
-mg
--
Via iPhone 3G
On 04-août-08, at 19:46, Lori Alt [EMAIL PROTECTED] wrote:
Seymour Krebs wrote:
Machine is running x86 snv_94 after recent upgrade from
This knowing i will never putt non ecc memory in my boxes again.
What's your mainboard and CPU? I've looked up the thread on the forum
and there's no hardware information. Don't be fooled just because the
RAM's ECC. The mainboard (and CPU in case of AMDs) have to support that.
There are two
mainboard is :
KFN4-DRE
more info you find here :
http://www.asus.com/products.aspx?l1=9l2=39l3=174l4=0model=1844modelmenu=2
cpu:
2x opteron aMD Opteron 2350 2.0GHz HT 4MB SF
You'll be fine with that. Just had to make sure.
Regards,
-mg
signature.asc
Description: OpenPGP digital
We already have memory scrubbers which check memory. Actually,
we've had these for about 10 years, but it only works for ECC
memory... if you have only parity memory, then you can't fix anything
at the hardware level, and the best you can hope is that FMA will do
the right thing.
In
zfs destroy -t flag
Thumbs up for this.
Plus asking for an -i flag, for interactive mode, handy on things like
zfs destroy.
-mg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
c1t5d0 was part of a mirror but with c1t4d0 removed it now appears as
a single drive. Is there a way to recover from this by recreating the
mirror with c1t4d0?
Detaching a drive from a two-way mirror effectively breaks it up and
turns it into a single drive. That's normal. Just attach it back
Here's a link to a recent blog entry of Jeff Bonwick, lead engineer of
ZFS, showing him with Linus Torvalds, making mysterious comments in a
blog post that's tagged ZFS.
I hate to be a scaremongerer, but are we about to lose one major
advantage over the competition?
I mean, if the Linux folks to
Here's a link to a recent blog entry of Jeff Bonwick, lead engineer of
ZFS, showing him with Linus Torvalds, making mysterious comments in a
blog post that's tagged ZFS.
Well, here's the link, anyhow. :S
http://blogs.sun.com/bonwick/entry/casablanca
-mg
Oh, and here's the source code, for the curious:
The forensics project will be all over this, I hope, and wrap it up in a
nice command line tool.
-mg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
What is the status of ZFS on linux and what are the kernel’s supported?
There's sort of an experimental port to FUSE. Last I heard about it, it
isn't exactly stable and the ARC's missing too, or at least gimped.
There won't be in kernel ZFS due to license issues (CDDL vs. GPL).
-mg
Also if ZFS can be implemented completely outside of the Linux kernel
source tree as a plugin module then it falls into the same category of
modules as proprietary binary device drivers.
The Linux community has a strange attitude about proprietary drivers.
Otherwise I wouldn't have to put up
ZFS can use block sizes up to 128k. If the data is compressed, then
this size will be larger when decompressed.
ZFS allows you to use variable blocksizes (sized a power of 2 from 512
to 128k), and as far as I know, a compressed block is put into the
smallest fitting one.
-mg
How can I set up a ZVOL that's accessible by non-root users, too? The intent is
to use sparse ZVOLs as raw disks in virtualization (reducing overhead compared
to file-based virtual volumes).
Thanks,
-mg
This message posted from opensolaris.org
___
Similarly, read block size does not make a
significant difference to the sequential read speed.
Last time I did a simple bench using dd, supplying the record size as
blocksize to it instead of no blocksize parameter bumped the mirror pool
speed from 90MB/s to 130MB/s.
-mg
signature.asc
Why would you build a complex database filesystem for searching through some
pictures and word documents, or movies and songs? The answer is: you
wouldn't. You'd do what everyone is already doing: provide a user app that
indexes important files and lets you search them. Problem solved.
Ive heard that WinFS is a filesystem that has some kind of database? I didnt
understand the advantages because I havent read about it, but it is the best
thing since sliced bread according to MS.
My question is, because WinFS database is running on top of NTFS, could a
similar thing be
I do see that all the devices are quite evenly busy. There is no
doubt that the load balancing is quite good. The main question is if
there is any actual striping going on (breaking the data into
smaller chunks), or if the algorithm is simply load balancing.
Striping trades IOPS for
Also, what other things are coming with ZFS boot in b87? Or is it just
support for it in the installer?
- a fair amount of cleanup and bug fixes
- support for swap and dump devices in pools
- sparc support
- installation support
Cool. But again, it's the Caiman installer bundled
I just purchased a new laptop and would like to set it up to boot from
ZFS. I read in newsgroups that it was predicted to be included into
b83, but this apparently didn't happen. Is there any prediction when
this may be available?
If you install OpenSolaris Developer Preview (ie Indiana)
For a home user, data integrity is probably as, if not more, important
than for a corporate user. How many home users do regular backups?
I'm a heavy computer user and probably passed the 500GB mark way before
most other home users, did various stunts like running a RAID0 on IBM
Deathstars,
So I see no reason to change my suggestion that consumers just won't notice
the level of increased reliability that ZFS offers in this area: not only
would the difference be nearly invisible even if the systems they ran on were
otherwise perfect, but in the real world consumers have other
I haven't seen the beginning of this discussion, but seeing SiI sets the
fire alarm off here.
The Silicon Image chipsets are renowned to be crap and causing data
corruption. At least the variants that usually go onto mainboards. Based
on this, I suggest that you should get a different card.
-mg
The question is - why I can't get that kind of performance with single zfs
pool (striping accross all te disks)? Concurrency problem or something else?
Remember that ZFS is checksumming everything on reads and writes.
-mg
signature.asc
Description: OpenPGP digital signature
Having my 700Gb one disk ZFS crashing on me created ample need for a recovery
tool.
So I spent the weekend creating a tool that lets you list directories and
copy files from any pool on a one disk ZFS filesystem, where for example the
Solaris kernel keeps panicing.
Is there any
Besides,
there are some new results about BWT that I'm sure would be of
interest in this context.
I thought bzip2/BWT is a compression scheme that has a heavy footprint
and is generally brain damaging to implement?
-mg
signature.asc
Description: OpenPGP digital signature
I have:
2x150GB SATA ii disks
2x500GB SATA ii disks
Is it possible/recommended to have something like a pool of two raidz pools.
This will hopefully maximize my storage space compared to mirrors, and still
give me self healing yes?
You can't create a RAID-Z out of two disks. You either
Hi, thanks for the tips. I currently using a 2 disk raidz configuration and
it seems to work fine, but I'll probably take your advice and use mirrors
because I'm finding the raidz a bit slow.
What? How would a two disk RAID-Z work, anyway? A three disk RAID-Z
missing a disk? 50% of the
If I have a pool that made up of 2 raidz vdevs, all data is striped across?
So if I somehow lose a vdev I lose all my data?!
If your vdevs are RAID-Z's, there has to be a rare coincidence to happen
to break the pool (two disks failing in the same RAID-Z)...
But yeah, ZFS spreads blocks to
I'm more worried about the availability of my data in the even of a
controller failure. I plan on using 4-chan SATA controllers and
creating multiple 4 disk RAIDZ vdevs. I want to use a single pool, but
it looks like I can't as controller failure = ZERO access, although the
same can be said
Yes, I'm not surprised. I thought it would be a RAM problem.
I always recommend a 'memtest' on any new hardware.
Murphy's law predicts that you only have RAM problems
on PC's that you don't test!
Heh, the last ever RAM problems I had was a broken 1MB memory stick on
that wannabe 486 from
I added a SI 2 port PCI SATA controller, but it seemed to not be recognized
so I am not using it.
Do you by chance mean Silicon Image with that SI? Their chipsets
aren't exactly known for reliability and data safety. Just pointing that
out as potential source of problems.
-mg
signature.asc
There are however a few cases where it will not be optimal. Eg, 129k files
will use up 256k of space. However, you can work around this problem by
turning on compression.
Doesn't ZFS pack the last block into one of a multiple of 512?
If not, it's a surprise that there isn't a
This mode has many benefits, the least not being that is practically
creates a fully dynamic mode of mirroring (replacing raid1 and raid10
variants), especially when combined with the upcoming vdev remove and
defrag/rebalance features.
Vdev remove, that's a sure thing. I've heard about defrag
Actually, ZFS is already supposed to try to write the ditto copies of a
block on different vdevs if multiple are available.
*TRY* being the keyword here.
What I'm looking for is a disk full error if ditto cannot be written
to different disks. This would guarantee that a mirror is written
I don't mean single disk vdevs, because that's trivial, but mirror or RAID-Z
ones.
Let's assume I want to replace a mirror/RAID-Z with a bigger one, but don't
want to go through to procedure of scrubbing the array with each disk
replacement until I've reached the new array size. Connections
What other ZFS features depend on ZFS RAID ?
Mostly the self-healing stuff.. But if it's not zfs-redundant and a
device experiences write errors, the machine will currently panic.
Wow, this is certainly worse than the current VxVM/VxFS
implementation. At least there I get I/O errors and
Accessibility of the data is also a reason, in dual boot scenarios.
Doesn't need to be a native Windows driver, but something that still
ties into the Explorer. There's still the option of running Solaris in
VMware, but that's a bit heavy handed.
-mg
TT You like Windows /that much/ ? Note Sun
While the original reason for this was swap, I have a sneaky suspicion
that others may wish for this as well, or perhaps something else.
Thoughts? (database folks, jump in :-)
Lower overhead storage for my QEMU volumes. I figure other filesystems
running within a ZVOL may cause a
Because you have to read the entire stripe (which probably spans all the
disks) to verify the checksum.
Then I have a wrong idea of what a stripe is. I always thought it's the
interleave block size.
-mg
signature.asc
Description: This is a digitally signed message part
I had the same question last week decided to take a similar approach.
Instead of a giant raidz of 6 disks, i created 2 raidz's of 3 disks
each. So when I want to add more storage, I just add 3 more disks.
Even if you've created a giant 6 disk RAID-Z, apart from a formal
warning requiring the
Correction:
SATA Controller is a Sillcon Image 3114, not a 3112.
Do these slow speeds only appear when writing via NFS or generally in
all scenarios? Just asking, because Solaris' ata driver doesn't
initialize settings like block mode, prefetch and such on IDE/SATA
drives (that is if ata
A 6 disk raidz set is not optimal for random reads, since each disk in
the raidz set needs to be accessed to retrieve each item.
I don't understand, if the file is contained within a single stripe, why
would it need to access the other disks, if the checksum of the stripe
is OK? Also, why
2. ZFS doesn't make much sense for high-performance laptops. Laptop drives
are slow enough without artificially increasing the number of seeks on
writes. Apple makes a LOT of money from laptops. It's also unclear how well
ZFS would play with other latency- and CPU-sensitive applications
Here's one possible reason that a read-only ZFS would be useful: DVD-ROM
distribution.
built-in compression works for DVDs, too.
Sector errors on DVD are not uncommon. Writing a DVD in ZFS format with
duplicated data blocks would help protect against that problem, at the cost
of
This LZO issue is something that might crop up again and again in
different shapes. If ZFS is being adopted on different operating
systems, people might start cooking their own soups.
What are the plans to keep this under control? What if Unix
Variant/Clone X suddenly decides their ZFS code needs
I definitely [i]don't[/i] want to use flash for swap...
You could use a ZVOL on the RAID-Z. Ok, not the most efficient thing,
but there's no sort of flag to disable parity on a specific object. I
wish there was, exactly for this reason.
-mg
signature.asc
Description: This is a digitally
A bunch of disks of different sizes will make it a problem. I wanted to
post that idea to the mailing list before, but didn't do so, since it
doesn't make too much sense.
Say you have two disks, one 50GB and one 100GB, part of your data can
only be ditto'd within the upper 50GB of the larger
Trying some funky experiments, based on hearing about this readonly ZFS
in MacOSX, I'm kidding around with creating file based pools and then
burning them to a CDROM. When running zpool import, it does find the
pool on the CD, but then warns about a read-only device, followed by a
core dump.
[EMAIL PROTECTED]:~/LargeFiles zpool import -o readonly testpool
internal error: Read-only file system
Abort (core dumped)
[EMAIL PROTECTED]:~/LargeFiles
Interesting, I've just filed 6569720 for this behaviour - thanks for
spotting this! Regardless of whether ZFS supports this, we
I think in your test, you have to force some IO on the pool for ZFS to
recognize that your simulated disk has gone faulty, and that after the
first mkfile already. Immediately overwriting both files after pool
creation leaves ZFS with the impression that the disks went missing. And
even if ZFS
Lot of small files perhaps? What kind of protection
have you used?
No protection, and as much small files as a full distro install has, plus some
more source code for some libs. It's just 28GB that needs to be resilvered, yet
it takes like 6 hours at this abysmal speed.
At first I thought it
Oh god I found it. So freakin' bizarre. I'm pushing now 27MB/s average, instead
of meager 1.6MB/s. That's more like it.
This is what happened:
Back in the day when I bought my first SATA drive, incidentally a WD Raptor, I
wanted Windows to boot off it, including bootloader placement on it and
I've read that it's supposed to go at full speed, i.e. as fast as possible. I'm
doing a disk replace and what zpool reports kind of surprises me. The resilver
goes on at 1.6MB/s. Did resilvering get throttled at some point between the
builds, or is my ATA controller having bigger issues?
While trying some things earlier in figuring out how zpool iostat is supposed
to be interpreted, I noticed that ZFS behaves kind of weird when writing data.
Not to say that it's bad, just interesting. I wrote 160MB of zeroed data with
dd. I had zpool iostat running with an one second interval.
Something I was wondering about myself. What does the raidz toplevel (pseudo?)
device do? Does it just indicate to the SPA, or whatever module is responsible,
to additionally generate parity? The thing I'd like to know is if variable
block sizes, dynamic striping et al still applies to a single
Given the odd sizes of your drives, there might not
be one, unless you
are willing to sacrifice capacity.
I think for the SoHo and home user scenarios, I think it might be of advantage
if the disk drivers offer unified APIs to read out and interpret disk drive
diagnostics, like SMART on ATA
What are these alignment requirements?
I would have thought that at the lowest level, parity stripes would have been
allocated traditionally, while treating the remaining usable space like a JBOD
the level above, thus not subject to any restraints (apart when getting close
to the parity stripe
I spend yesterday all day evading my data of one of the Windows disks, so that
I can add it to the pool. Using mount-ntfs, it's a pain due to its slowness.
But once I finished, I thought Cool, let's do it. So I added the disk using
the zero slice notation (c0d0s0), as suggested for performance
I'm just in sort of a scenario, where I've added devices to a pool and would
now like the existing data to be spread across the new drives, to increase the
performance. Is there a way to do it, like a scrub? Or would I have to have all
files to copy over themselves, or similar hacks?
Thanks,
While setting up my new system, I'm wondering whether I should go with plain
directories or use ZFS filesystems for specific stuff. About the cost of ZFS
filesystems, I read on some Sun blog in the past about something like 64k
kernel memory (or whatever) per active filesystem. What are however
The filesystem allows to keep two or more copies of the data written. What I'm
interested in to know is how the placement of the copies is done. Consider a
JBOD pool, having set the filesystem to keep two copies, will the copies be
actively placed on two different volumes, as such allowing to
Is it possible to gracefully and permanently remove a vdev from a pool without
data loss? The type of pool in question here is a simple pool without
redundancies (i.e. JBOD). The documentation mentions for instance offlining,
but without going into the end results of doing that. The thing I'm
89 matches
Mail list logo