Re: [zfs-discuss] USB WD Passport 500GB zfs mirror bug

2009-09-14 Thread Louis-Frédéric Feuillette
On Sun, 2009-09-13 at 11:01 -0700, Stefan Parvu wrote:
> 5. Disconnecting the other disk. Problems occur:
> # zpool status zones
>   pool: zones
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.
> An
> attempt was made to correct the error.  Applications are
> unaffected.
> action: Determine if the device needs to be replaced, and clear the
> errors
> using 'zpool clear' or replace the device with 'zpool
> replace'.
>see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: resilver completed after 0h0m with 0 errors on Sun Sep 13
> 20:58:02 2009
> config:
> 
> NAME  STATE READ WRITE CKSUM
> zones ONLINE   0 0 0
>   mirror  ONLINE   0 0 0
> c7t0d0p0  ONLINE   0   167 0  294K resilvered
> c7t0d0p0  ONLINE   0 0 0  208K resilvered
> 
> errors: No known data errors
> 
> 
> # zpool status zones
>   pool: zones
>  state: DEGRADED
> status: One or more devices could not be used because the label is
> missing or
> invalid.  Sufficient replicas exist for the pool to continue
> functioning in a degraded state.
> action: Replace the device using 'zpool replace'.
>see: http://www.sun.com/msg/ZFS-8000-4J
>  scrub: resilver completed after 0h0m with 0 errors on Sun Sep 13
> 20:58:02 2009
> config:
> 
> NAME  STATE READ WRITE CKSUM
> zones DEGRADED 0 0 0
>   mirror  DEGRADED 0 0 0
> c7t0d0p0  ONLINE   0   167 0  294K resilvered
> c7t0d0p0  FAULTED  0   113 0  corrupted data
> 
> errors: No known data errors
> 
> 
> I have disconnected c8t0d0p0 but zfs reports that c7t0d0p0 has been
> faulty !?

Both disks read c7t0d0p0, not c7t0d0p0 and c8t0d0p0 as you have in #1-4.
Typo?

-- 
Louis-Frédéric Feuillette 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs compression algorithm : jpeg ??

2009-09-04 Thread Louis-Frédéric Feuillette
On Fri, 2009-09-04 at 13:41 -0700, Richard Elling wrote:
> On Sep 4, 2009, at 12:23 PM, Len Zaifman wrote:
> 
> > We have groups generating terabytes a day of image data  from lab  
> > instruments and saving them to an X4500.
> 
> Wouldn't it be easier to compress at the application, or between the
> application and the archiving file system?

Preamble:  I am actively doing research into image set compression,
specifically jpeg2000, so this is my point of reference.


I think it would be easier to compress at the application level. I would
suggest getting the image from the source, then use lossless jpeg2000
compression on it, saving the result to an uncompressed ZFS pool.

JPEG2000 uses arithmetic encoding to do the final compression step.
Arithmetic encoding has a higher compression rate (in general) than
gzip-9, lzbj or others.  There is an opensource implementation of
jpeg2000 called jasper[1].  Jasper is the reference implementation for
jpeg2000, meaning that all other jpeg2000 programs must verify it's
output to that of jasper (kinda).

Saving the jpeg2000 image to an uncompressed ZFS partition will be the
fastest thing.  Since jpeg2000 is already compressed, trying to compress
it will not yeild any storage space reduction, in fact it may _increase_
the size of the data stored on disk.  Since good compression algorithms
result in random data you can see why running on a compressed pool would
be bad for performance.

[1] http://www.ece.uvic.ca/~mdadams/jasper

On a side note, if you want to know how Arithmetic encoding works,
Wikipedia[2] has a real nice explanation.  Suffice it to say, in theory
( Without considering implementation details ) arithmetic encoding can
encode _any_ data at the rate of data_entropy*num_of_symbols +
data_symbol_table. In practice this doesn't happen due to floating point
overflows and some other issues.

[2] http://en.wikipedia.org/wiki/Arithmetic_coding

-- 
Louis-Frédéric Feuillette 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Books on File Systems and File System Programming

2009-08-14 Thread Louis-Frédéric Feuillette
I did see this, Thanks.

On Fri, 2009-08-14 at 10:51 -0400, Christine Tran wrote:
> 
> 
> 2009/8/14 Louis-Frédéric Feuillette 
> 
> 
> I am primarily interested in the theory of how to write a
> filesystem.
> The kernel interface comes later when I dive into a OS
> specific details.
> 
> Have you seen this?  
> 
> http://www.letterp.com/~dbg/practical-file-system-design.pdf
> 
> I found this an excellent read.  The author begins by explaining
> what's expected from an FS, he explains the design choices, some
> trade-offs, how the design interfaces with the actually hardware.  No
> specific OS detail, no API, no performance number.  Very solid
> fundamentals.

-- 
Louis-Frédéric Feuillette 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Books on File Systems and File System Programming

2009-08-14 Thread Louis-Frédéric Feuillette
On Fri, 2009-08-14 at 12:34 +0200, Joerg Schilling wrote:
> Louis-Frédéric Feuillette  wrote:
> 
> > I saw this question on another mailing list, and I too would like to
> > know. And I have a couple questions of my own.
> >
> > == Paraphrased from other list ==
> > Does anyone have any recommendations for books on File Systems and/or
> > File Systems Programming?
> > == end ==
> 
> Are you interested in how to write a filesystem or in how to write the 
> filesystem/kernel interface part?

I am primarily interested in the theory of how to write a filesystem.
The kernel interface comes later when I dive into a OS specific details.

-- 
Louis-Frédéric Feuillette 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Books on File Systems and File System Programming

2009-08-13 Thread Louis-Frédéric Feuillette
I saw this question on another mailing list, and I too would like to
know. And I have a couple questions of my own.

== Paraphrased from other list ==
Does anyone have any recommendations for books on File Systems and/or
File Systems Programming?
== end ==

I have some texts listed below, but are there books/journals/periodicals
that start from the kernel side of open(2), read(2), write(2), etc. and
progress to disk transactions?

With the advent of ZFS and other transaction based files systems it
seems to me that the line between File Systems and Databases are
beginning to blur ( If they haven't already been doing so for some
time ).  Any pointers the likes of "X from here, Y from there, Z from
over yonder and squished together like Q" are also welcome.

(relevant) Books I have:
Understanding the Linux Kernel ( The chapters about ext2 and VFS )
Systems programming in the UNIX envirionment
File Structures: An OO approach using C++
Database System concepts (More about SQL and how to implement Joins )

Thanks in advance.

-- 
Louis-Frédéric Feuillette 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-11 Thread Louis-Frédéric Feuillette
On Tue, 2009-08-11 at 08:04 -0700, Richard Elling wrote:
> On Aug 11, 2009, at 7:39 AM, Ed Spencer wrote:
> > I suspect that if we 'rsync' one of these filesystems to a second
> > server/pool  that we would also see a performance increase equal to  
> > what
> > we see on the development server. (I don't know how zfs send a receive
> > work so I don't know if it would address this "Filesystem Entropy" or
> > specifically reorganize the files and directories). However, when we
> > created a testfs filesystem in the zfs pool on the production server,
> > and copied data to it, we saw the same performance as the other
> > filesystems, in the same pool.
> 
> Directory walkers, like NetBackup or rsync, will not scale well as
> the number of files increases.  It doesn't matter what file system you
> use, the scalability will look more-or-less similar. For millions of  
> files,
> ZFS send/receive works much better.  More details are in my paper.

Is there link to this paper available?

-- 
Louis-Frédéric Feuillette 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lundman home NAS

2009-08-01 Thread Louis-Frédéric Feuillette
On Sat, 2009-08-01 at 22:31 +0900, Jorgen Lundman wrote:
> Some preliminary speed tests, not too bad for a pci32 card.
> 
> http://lundman.net/wiki/index.php/Lraid5_iozone

I don't know anything about iozone, so the following may be NULL &&
void.

I find the results suspect.  1.2GB/s read, and 500MB/s write ! These are
impressive numbers indeed.  I then looked at the file sizes that iozone
used...  How much memory do you have?  I seems like the files would be
able to comfortably fit in memory.  I think this test needs to be re-run
with Large files (ie >2*Memory size ) for them to give more accurate
data.

Unrelated, what did you use to generate those graphs, they look good.
Also, do you have a hardware list on your site somewhere that I missed?
I'd like to know more about the hardware.

-- 
Louis-Frédéric Feuillette 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSDs get faster and less expensive

2009-07-21 Thread Louis-Frédéric Feuillette
On Tue, 2009-07-21 at 14:45 -0700, Richard Elling wrote: 
> But to put this in perspective, you would have to *delete* 20 GBytes of
> data a day on a ZFS file system for 5 years (according to Intel) to  
> reach the expected endurance.

Forgive my ignorance, but is this not exactly what a SSD ZIL does? A ZIL
would need to "delete" it's data when it flushes to disk. I know this
thread is about consumer SSDs but are the enterprise SSDs that much
better in terms of write cycles (not speed, I know they differ in some
cases dramatically).

Richard, do you have a blog post about SSDs that I missed in my travels?

-- 
Louis-Frédéric Feuillette 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pegging the system

2009-07-17 Thread Louis-Frédéric Feuillette
On Thu, 2009-07-16 at 10:51 -0700, Jeff Haferman wrote:
> We have a SGE array task that we wish to run with elements 1-7.  
> Each task generates output and takes roughly 20 seconds to 4 minutes  
> of CPU time.  We're doing them on a machine with about 144 8-core nodes,
> and we've divvied the job up to do about 500 at a time.
> 
> So, we have 500 jobs at a time writing to the same ZFS partition.

Sorry no answers, just some question that first came to mind.

Where is your bottleneck?  Is it drive I/O or Network?

Are all nodes accessing/writing via NFS?  Is this a NFS sync issue?
Might a SSD ZIL help?
-- 
Louis-Frédéric Feuillette 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Single disk parity

2009-07-07 Thread Louis-Frédéric Feuillette
On Tue, 2009-07-07 at 17:42 -0700, Richard Elling wrote:
> Christian Auby wrote:
> > ZFS is able to detect corruption thanks to checksumming, but for single 
> > drives (regular folk-pcs) it doesn't help much unless it can correct them. 
> > I've been searching and can't find anything on the topic, so here goes:
> >
> > 1. Can ZFS do parity data on a single drive? e.g. x% parity for all writes, 
> > recover on checksum error.
> > 2. If not, why not? I imagine it would have been a killer feature.
> >
> > I guess you could possibly do it by partitioning the single drive and 
> > running raidz(2) on the partitions, but that would lose you way more space 
> > than e.g. 10%. Also not practical for OS drive.
> >   
> 
> You are describing the copies parameter.  It really helps to describe
> it in pictures, rather than words.  So I did that.
> http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection
>  -- richard

I think one solution to what Christian is asking is copies.  But I think
he is asking if there is a way to do something like a 'raid' of the
block so that your capacity isn't cut in half. For example, write 5
blocks to the disk, 4 data and one parity, then if any one of the block
gets corrupted or is unreadable, then you can reconstruct the missing
block. In this example you would only loose 20% of your capacity not
50%.

I think this option would only really be useful for home users or simple
workstations. It also could have some performance implications.

-Jebnor
-- 
Louis-Frédéric Feuillette 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Ditto blocks on RAID-Z pool.

2009-07-02 Thread Louis-Frédéric Feuillette
Hello all,

If you have copies=2 on a large enough raid-z(2) pool and 2(3) disks
die, is it possible to recover that information despite the offline
state of the pool?

I don't have this happening to me, it's just a theoretical question.
So, if you can't recover the data, is there any advantage to using ditto
blocks on top of raid-z(2)?

Jebnor

-- 
Louis-Frédéric Feuillette 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mobo SATA migration to AOC-SAT2-MV8 SATA card

2009-06-20 Thread Louis-Frédéric Feuillette
A couple questions out of pure curiosity.

Working on the assumption that you are going to be adding more drives to
your server, why not just add the new drives to the Supermicro
controller and keep the existing pool (well vdev) where it is?

Reading your blog, it seems that you need one (or two if you are
mirroring) SATA ports for your rpool.  Why not just migrate two drives
to the new controller and leave the others where they are?  OpenSolaris
won't card where the drives are physically connected as long as you
export/import.

-Jebnor

On Fri, 2009-06-19 at 16:21 -0700, Simon Breden wrote:
> Hi,
> 
> I'm using 6 SATA ports from the motherboard but I've now run out of SATA 
> ports, and so I'm thinking of adding a Supermicro AOC-SAT2-MV8 8-port SATA 
> controller card.
> 
> What is the procedure for migrating the drives to this card?
> Is it a simple case of (1) issuing a 'zpool export pool_name' command, (2) 
> shutdown, (3) insert card and move all SATA cables for drives from mobo to 
> card, (4) boot and issue a 'zpool import pool_name' command ?
> 
> Thanks,
> Simon
> 
> http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/
-- 
Louis-Frédéric Feuillette 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss